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^Chapter i 
Overview of IRLD Evaluation Research 

Over; a six-year^eriod, the Institute for Research Learning 
Disabilities (IRLD) at University of Mi;inesdta conducted research 
on evaluation issues, especially as they relate to assessing 
educational progress of leanning disabled students^ identifying 
instructiorially-relevant evaluation procedures, and using Cdntihudus 
evaluation in classrooms. Current evaluation practices* alternative 
measurement procedures, and the use of data to evaluate sti^dents' 
programs were studied by means of a systematic research program. 

This report describes the results of IRLf) studies that provide 
inform*ation on evaluation fDrocedures, especially as they relate to 

students who are receiving special education servic^^"^~rTrTS*ngs from 

i- 

separate studies have been integrated to address major issues and to 
produce recommendations ^ for practice that are based on research 

v_ _ 

results.^ The studies from which the findings and recqmmeridatior^ were 
derived used a variety of methodologies. Included among these were: 
. - ComparaJ:ive studies j 

• Surveys and interviews » p 

- Experimental studies 

• Developmental studies 

- Observations 

• Single subject studies ' \ ' 

- Analytical studies 

- Implementation studies 
Highlights of M ajor Findings 

^The major questions that we asked mjSthe. major findings are 
'presented here in very brief form. Imol icSt^Trrns cyf thp find 



practice are discussed in Chapter 2. Details of the evidenc_e that 

_ • : , . . _____ 

supports, the findings are presented in Chapters 3-10. Information oh 
the data sources and specific research procedures are presented in 
Chapter 11. . ^ . 

Current Evaluation Pr^cticei 

1. What do teachers report to be their typical evaluation 
practices? 

Most teachers eva1ua'?S| student progress four times 
during the school yea^. 

Teachers primar i ly "rely on informal observations or • 
informal tests to assess student mastery of lEP goals; 
they rarely use systerr^tic e^lifation procedures.. 

The cdhfiderice that teachers h-ave regarding the 
accuracy of their judgments about: student performance 
is unjustified. 

Regardless of the evaluation procedure used, the 
frequency of measurement varies greatly from one ' 
teacher to the next. 



2. To what ext^t do teachers use diretft.arid frequent 
measurement^procedures for evaluation? 

- _ _ _ - - 

a. Most special educatidn teachers are familiar with direct 
and frequent measurement strategies^ but few Use them. 

b. Teachers believe that 'direct and frequent measuremerit. 

is time consuming and takes away from. instructional time. 

c. Teachers who do use direct and frequent measurement 
strategies, on the average, use only a small proportion 
of a student's instructional time, 

3. To what extfent do teachers use the Information obtained from 
direct and frequent meastirement to make ^nnstructional 
changes? . - . 

a* Teachers primarily rely on personal observation and 
judgrhent to'^^ake changes in instructional programs. 
Few teachers use direct and frequent evaluation 
, strategies to decide about changes in students' 
instructidnal plans or to decide when to reteach or. 
review a ski 11. 

__ \ ______ • * ' 

b. Teachers who are required to use direct and frequent 



b. 



c. 



3 



measuremeht strategies mal<e more instructional program 
changes for students than do'teachers not required to 
use the strategies. 

Changes made .'by teachers are variable; the most commort 
characteristic of changes is the infrequencl^ With which 
they are made. * , 

Training in. Hata eyalUaVibn prbeedures shoulddnclade a 
focus on appropriate changes to make in instruction, 
motivation^ and physical setting. 



What are the characteristics, of a recbmmended direct measure 
of reading? V 

a. Ajiirect measure of reading should fbcus* bn the 
behavior of reading aloud from text. Measures of this 
behavior are technically adequate (valid, reliable, and 
sensitive to student growth)^, have instructibna] 
utility, and are logisticany feasible in the classroom. 
A second choice behavior to measure is reading aloud from 
word li?ts. 

b. When' assessing a student^s level of performance, the 
difficulty level of the direct reading rt^asure should be 
as close as possible to the age-gr^e apfropriate level^ 
without reaching a level, so frustrating that the measure 
is insensitive to student growth. ^ 

c. When assessing a student's level of performance, reading 
test items (text passages or words) should be selected 
randomly from a midrsized domain, such as stories or 
words within a basal reader. - ' 

d. When selecting passages from one basal reader it is 



desirably to select several "parallel" forms. 



e. When assessing a student's mastery within progress 
fpeesurement ^ the reading mastery criteribh should be 
an absolute raw score correct and incorrect criterion; 
^a recommended criterion is 50-70 words correct per 
Tfiinute-, with 7 or fewer errors. t 

5; How should the direct reading measure be administered and 
scored? 

a. The duration of a direct reading measure should be from 
one to three minutes each time it is admi^nistered. 

fc>i Reading performance or progress on a direct_reading 

fTieasure should be sebred- in terms of the number of words 
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read correctly; 



e. Within ah evaluation system, the direct reading measure 
should be administered at least two to three times per 
week. 

d. The determination of whether to measure performance or 
progress should be made in light of individual student 
arid teacher needs. Both procedures produce technically 
adequate data. 

6. To what extent are basal reader criterion-referenced tests 
technically adequate? 

" Despite the cdriterit arid f ace yal idi ty of basal'reader • 
criterion-referericed tests, their technical adequacy is 
often questionable. 

Spe 1^1 ing Evaluation " ^ 

7. What are the characteristics of a recdrmlerided direct measure 



of spelling? 



a. A direct measure of spelling should focus dri the behavior 
of writing words dictated from lists. Measures of this 
behavior are technica*! ly .adequate (valid^ reliable, arid 

. sensitive to student growth)^ have instructional utility^ 

and are logistically feasible in the classrodm. A secdrid 
choice behavior to measure is writing compositions. 

b. When assessing a^'student^s level of performance^ the 

' difficulty level of the direct spelling measure should 

' ' be within orie to two grades of the student's 

instructional level. 

c. When assessing a stiudent's level of p'erformance^ words 
included in a dJetated spelling list should- be selected 
rariddmly from t>ie domain of wdrds in the'spelling text 
or b^isal reader. 

. . ^ * • , _ _ ' 

8. Hdw shdWld the direct spelling measure be administered -and 
scored? ^ ' . ^ _ . 

a. Tffe duratibri df a d irect spel 1 ing Sieafure sfiould be froj^ 
two to three miriutes each time it i's administered, Pa/ed 
^ * dictation a^ a rate df 15 secdrids per wbrd is an ( 
acceptable procedure. , ' : 



Pi?rf grmance on a direct spell irig measure should be scored 
in terms of elither the riUmber df wdrds spelled correctly 
or the number of letters in correct sequence. Letters 
in correct sequence is preferred for Idw-f Urictidriirig 
students. 
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; Ci Within an evaluation system, the .direct spenihg measure 
should be administered at least tv^b times j3er week. 

(i. The determinatidn of whether to measure performance or 
progress should be made on the basis of individual 
student and teacher needs, ^he two procedures produce 
simi 1 ar* results . 

Written Expression Evaluatidn ^ 

9. What are the characteristics of a recommended direct measure 
of written express1a.n? - 

' A direct measure of written expression should focus on 
^ the behavior of Witing compositions in response a 

verbal stimulus. Certain measures of this behavior 
(total words written, total words spelled correctly, or 
letters in correct sequence ) are technical ly adequate , 
have instructional utility, and are logist.ically feasible 
in the classroom. 

Id. How should the direct written expression measure be 
' iftdministered and scored?' 

a 

a. The duratvion of a direct written expression measure 

j should* be three minutes each time it is administered. 

b. Performance on a direct written expression measure.; 
should be, scored In terms of either total number of 
words or number of correctly spelled words. 

c. WJthin an evdluation system, two or three writing 
simj3les should be elicited oh each measurement occasion. 

Bral Language EvaTliatibn 

11. What are the characteristics of a reccfmmended direct measure 

of oral language? ' • 

• - ■. * • 

V 

A direct measure of oral language $hduld focus on the 
behavior of describing a picture stimulus. . 

12. How should tiie direct oral language measure, be administered 
and scored? 

a. Performance on a direct or alia 

be scored in terms of the number of non-repetitive 

' . words spoken. ^ . 

■■ ' • 

6. The oral' language measure should be administered by a 
familiar examiner. • 
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Mathematics Evaluatim 



13. What are the characteristics nf a, recdmmendGcf d irect measure 



14. How should the direct mathematics measure be administered arid 
scored? 

, — _ _ _•• « 

a. • The types of problems presented to a student may be • 
determined by the grade level of the student or may 
sample from all types of math functions. 

. Performarice^bri a direct mathemat,ic9 measure should be 

scored iri terms of the number of digits correct. 



c. Within ah evaluation syst.em, several samples should 
be elicited/bri each measurement occasion. 

S^cXaJ- us t men t Evaluation 



15. . What are the characteristics of a recommended direct measure 
of social adjustment'^ 

— A direct measure of social adjustment should focus orf 
general classroom conduct arid sbQjal interaction. The 
specific behaviors should be ideritW.ied within the 
specific setting of Interest. 

IB. How should the direct social adjustment measure be- ' 
administ^ed and scored? 

a. Administration of the direct social adj^ustmerit measure 
could involve observation of the ,Uarget> student arid 
classmates on ah interval-sampling schedi^le. 

b. Performance coulxJ be scored by tci Hying occurrences of 
the ^target behaviors. : 

Data Utilization 

- 1" _ _ ^ _ _ _ ^ 

17. :What are recommended^ procedures for graphing data? 



of mathematics? 



Preliminary data suggest that a direct measure of 
mathematics should focus on the calculatibri of math 
computation problems. 




performarice to provide information about accuracy of 



perf drmarice. 



b. When graphiriij. a stt/dent *s level of /iSerf ormance, equal 
interval graph paper ^cari ]De used rather than 
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semi-1 bga^ithmi c chart i3aper. 

c. When graphing a s^ttjdehtVs fading or spelling progress 
through a. curriciilum^ number of words spelled or pages 
read should be spaced a1*bng the ordinate axis 'according 
to the time of rrfastery expected of average students in 
the curriculum. < . 

18. • How should graphed data be used to evaluate students' 
programs? 

- - ------- - - . i- - ^ 

a. Graphebdata-^hould be summarized and interpreted to 
determine whether the iRStructional program is effective 
or needs to be changed. 

b. GoaUoriehted analysis is preferred for monitoring 
progress toward lEP goals, dbtaihihg information about 
when to change a student's instructional program, and 
explaining student progress to parents and other 
teachers. 

c. Program-oriented analysis is preferred for obtaining 
j/hfbrmation about ^what to changi in a student's 
instructional program. ^ - ^ 



A combined goa-1-oriented and prcxgram^oriented procedure 
that is recommended involves drawing a trend line 
through 7 to lO data points; if the trend is flatter 
than the goal line, a pft^ogram modification should be 
introduced. * ^' , 



19. How should teachers be trained to use data forj judging 

intervention effectiveness and improving student perf ormance'^* 

- - - ^ - 

a. Direct inservice dfewrkshop training, rather than self 
instruction^ is relfemmehded for training teachers to 
col lect data .frequently and to use the data to make 
instructional disc is ions. 

b. Systematic procedural changes can increase teachers' 
efficiency in using direct and f requehtrmeasorement ^ 

-procedures. ^ ' 

e. Direct training of teachers in measurement activities 
is more likely to result in t^aicher use and ef^riciency 
than training through manuals alone. 

- > • • 

d. Goal setting IS integral to progress measurement 
activities; teacKers: shoul d monitor stuc^nt performance 
in relation to short-term objectives rather than - 
long-term goals. 

e. ^irect and frequent measurement with carriculum-based 
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To wha^t extent do measuremerit arid data utilization by 
teachers affect students' learhihg? 

a. Student perfbrmance increases more when teachers use 
specific data-utn i zat ion rules to mbhitdr progress than 
when they rely on their own judgment about student 
progress. 

b. The quality of instruction improves when teachers tise 
direct and frequent measurement and evaluation. 



c. Students ' knowledge about their goals and progress is 
grater when teachers employ direct and frequent 
mea^i^ement and evalu|itidn. 

d. Measurement appears to be a necessary condition in 
producing student growth^ but not a sufficient one; 
pD|itive effects of measurement cannot be sustained, 
unfess data-utilization procedures also are used. 
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implications for educators. Some of the general implications are 
discussed in this chapter. More specific implications can be found in 
iRtB research-reports and monographs. 

At the most general level, the IRLD research indicates that tf^re 
^ are" viable alternatives to those Current evaluation practices which 
lack : technical adequacy and which frequently are unrelated to making 
instructional decisions. For the most part, evaluation of learrrihg 
disabled students is characterized by pre and post testing on 
standardized measures and by informal teacher procedures during the 
course of instruction. The IRLD research findings suggest that 
procedures that de-emphasize standardized testing^'tnd^ that emphasize 
Continuous monitoring of pupil performance represent a more efficient 
and effective approach to evaluation when providing s|3ecial education 
services to students. Further, the alternative approaches we have 
developed require as, little as one to three minutes of testing time in 
a specific area, can also be used to make identification and 
eligibility decisions, and broader decisions about program 
; effectiveness aiad allocation of resources. 

IRtt) research focused mainly on identifying and analyzing 
alternative evaluation measures in the areas of reading, spelling, and 
written expression; Some initial work also was done in oral language 
and mathematics. The mathematics work is being continued by local 
school districts who participated in IRLn research. in addition, 
research on noh-academic measures (social adjustment) also was 
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be used to improve edueatibrial programs for students. 

The sj3eeifJe nattire of the alternative approach reflects- the 
ribtibn that students must be measured on instructiohally relevant 
tasks that can be admirii.stered repeatedly, and that their performance 
must be mbriitbred cbnt inuous^y to identify when instructional, 
mbti vatibrial, or other types bf changes are needed to maintain student 
performance growth. FiJrthermbre, the irifbrm'atibn bbtained must be 
used systematically tb make changes fbr students. The need for this 
type bf approach is inherent in federal law (P.L. 94-142) which ' 
requires that schools construct individual educational programs (lEPs) 
for special education students. The lEP^ must specify curriculum- 
based goals, and procedures for measuring progress toward those goals. 
A critical component of these procedures >is their usefulness in 
generating data that can verify the extent to which program changes 
lead to program goals. 

The lRLi3 research verified that efficient measures could be 
developed for readi^, spelling, written expression, oral language, 
and mathematics. Procedures also were identified for social 
adjustment, but these were more situation specific, thus limiting 
their usefulness. Extensive studies documented the technical adequacy 
bf the develbped measures. Numerous implementation studies examined 
the feasibility of using the develbped measures and the alternative 
approach to evaluation within special education programs. Measures 
and procedures were revised on the basis bf these studies. 



adopted^ the recdmmehded evaluatibn procedures in their special 
^eSucation programs. In some cases, the procedures were adopted only 
for monitoring progress; of special education students. In. other 
cases, the procedures were applied to the entire -array ofj special 
education _ decisions^ including eligibility and termination 
considerations. The types of programs adopting the procedures have 
been quite varied. For example, one school system is a rural 
educattonal cooperative comprised of six school districts. The school 
districts have a total ^f about 5,000 students, with approximately 250 
served in special education. Another school s.tstem is a large urban 
district that has a total student population of over 37,000 students. 
The total minority population accounts for 34.851$ of' the school 
population. Special education services are provided to 5200 students 
in this district. 

The adoption of the direct and frequent measurement procedures by 
school systems speaks for its usefulness and feasibility. Ah 
excellent case study of how such a measurement and evaluation system 
might be created and employed is provided in IRLD Monograph No. 20. 
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This chapter summarizes IRLD research findings reTated the 

nature of evaluation procedures typica-11y used by special educl^tidn 

teachers. Three specific questions are addressed in this chapter i 

• What do teachers report to be their typical evaluation ''\ 
practices? ' , . 



To what extent do teachers use direct and frequent measurement 
procedures, for evakttiltTi 

To what extent dc( teachers the information obtained from! 
direct and frequent measuremeHtJo make instructiona^l changes? 



For each question, the major findings are sammarized and t^ data 
sources from which the findings were obtained are listed (generality 
ordered in terms of recency). Specific evidence for^Athe major 
findings then is presented. 

What Do teachers Report to be Their Typical Evaluation Practices ? 

Findings: \ , " ' ■ 

a. Most teachers evaluate student progress four times during 
the school year. \ - 



Teachers primarily rely on informal observations or informal 
tests to assess student mastery of lEP goals; they rarely 
use systematic evaluation procedures. 



The confidence that teachers have regarding .the accuracy of 
their judgments about student performance is unjustified. 

Regardless of the evaluation procedure used^ the frequency 
of measurement varies greatly from brie teacher to the next. 



Data Sources : >-- 

- - . 

- Survey and observation of special education teachers (RR 81) 
' Survey of tO teachers (RR 65, 80) 

Eviderice : 

Surveys arid observations revealed that special education teachers 
primarily use irifbrmal observation and teacher judgment to formulate 



. f^. ^Jt, yuar:> {r\r\ oi ; , UVer* 

_ — - _ _ g * 

half of a group of nearly 150 i special education teachers i^SM) 

indicated that they evaluate progress on lEP objectives quarterly, 2651 

. indicated weekly evaluation or at per^idic review, arid less than 35^ 

- _ - - _ ,fc */ _ _ 

indicated only annual evaluation of student performance. The majority 

of teachers (65.3%) relied on informal observations Compiled over each 
quarter to formulate their de'^isi^ns as -to whether lEP objectives had 
been met. Informal, obserjva^bn not only was used more if ten than 
norm-referenced tests, criterion-referenced tests, and consiiltatiori, 
but also was the bffly method of progress evaluation used by 20% of the 
teachers. The general pattern 'of tho-ices of methods of evaluation was 
the same for eTementary and secondary teachers. Assessment of a 
student's level of performance oh mater i a1 covered in daily lessons 
involved « informal observation for 80% of the teachers. Almost all 
(over of the teachers were confident in their selected evaluation 

procedures for determining student mastary. In fact, these teachers" 
indicated they were "sure** or "very sure" about the student's level of 
performance. However, observations revealed that these teachers- 
failed to recognize when objectives were not met by their students ; 
for students who actually had fa'iled objectives, teachers frequently 
indicated that they had been met. 

A group of LD teachers identified their evaluation procedures for 
learning disabled students in reading, math, written language, and 
spelling (RR 65, 80). No single procedure or general type of 
evaluation was favored in reading and math. In ^these areas, teachers * 
most of*en mentioned crite>ibn-ref erericed measures, teacher-made: 



tests/or^ri quizzes^ informal bbservat i oris of studerit perfdrmariee; 
direct and frequent measurement ( i .e. , precisiori teaching), arid 
standardized achievement tests. Teachers also included workbook 
scoring as a frequently used procedure for evalUatirig math progress. 
Informal observation of student performarice was the chief form of 
evaluation in written language^ while teacher-made tests/oral quizzes 
were; clearly the most relied on form of evaluatibri iri spellirig. 
triformal observation of student performance^ primarily was Used to 
evaluate students in other academic areas. 

Teachers* frequency of evaluation varied with the area in which^ 
esvaluatiori was used. Weekly evaluations ^ were most common for written 
language and spelling, while daily evaluation in reading and math was 
meritibried by one-third of the teachers.. 

V _ _ _ _ 

Teachers rioted a riumb&r of ways iri which they use evaluation 
_ _ _ _ _ _ _ ^ > 

irifbrmatibri. Ambrig the most cbmmbrily noted were diseussirig progress 

with studerit arid ' parerit, chariging iristruct iorial plans, reteachirig 

skills^ and mbnitbring progress ori lEP goals arid bbn'ectives. Few 

teachers indicated that evaluatibri irifbrmatiori was used to assign 

grades or review progress with the child study team. Most of the 

teachers who used evaluation " information tb discuss prbgress with a 

student did this on a dai ly or weekly basi s; teachers whb used 

[--— 

evaluation information when reviewing progress with the team did sb 
much less frequently (i.e., semi-annually, annually). 

Most teachers /were satisfied with the amount of time spent ifT' 

1< - _ L___ _^ 

evaluation activities; one-fourth of the;sample desired an increase in 
evaluation, while 12. desired a decrease. , Three-fourths of the 



teachers indicated they spent 30% of theiy time In evaluation. ^ The 
remaining teachers .i^j^icated that they s^nt more than 30% of their 
time in evaluation. ; . % 

To Whit Extent Bo Teac hers > Use . D irge t and Frequent Measuregient— 
Procedjj res for E\ Aa1uation? n~ 



Findinqs: ' . ' 

a. Most spectial education ^teachers are familiar with 
direct and frequent measurement stratifies, but few 

^use them. 

b. Teachers believe that direct and frequent measurement 
is time consuming and takes away from instructional 
time. 

c. Teachers who do use direct and frequent measurement 
strategies, oh the average, use only a smajl 
proportion of a student »s instructional time. 

Data Sources: 

• Surveys of experimental study participants (RR 124) 

• Comparative study of formative evaluation effects (RR 97) 

• SuWyl of special educators (RR 67) 

• Interviews, of special educators (RR 41) 

Evidence: 

. . . . _ /. :^ _ 

Most surveyed teachers indicates, they^were familiar with direct 
and .frequent measurement strategies^^^^bu^^ from one-third to one- 
half used the procedures with their ^udents, e^M though only a few 
believed such measurement was not j^ful '(RR 67). 'Some teachers, who 
were interviewed following their participation 'in one research, project 
in which they used direct and frequent measurement, indicated that -the 
procedures- ^took too much time CR^ 41). However, only 26% of the 
participants in aaother direct and frequent measurement study (RR 97) 
and only 4% in another (RR 124) indicated on surveys^, that the 




procedures were very time consuming. Of those teachers who typically 
used -direct and frequent measurement, most reported that ^ 20% or less 
of their time was devoted to measurement activities (RR 67,), However, 
variability in time was ^siderable; some teachers estimated that 
direct and frequ^t^ measurement activities ,todk up, to 30% of 
instructional time. Yet, comparison of teachers' estimated and actual 
measurement times indicated that teachers- who used the techniques 
geTierally oyerestimated how much time was involved (RR ^7). 

To What Extent Bo Teachers Use the In formation Obtained from Direct 

— i p* — '- '- '- ~ : — ~~ ^ — ■ 

md^requBnt Measurement to Make IhstrUctibhal Changes ? 

Fihdihgs: ^ 

a. Teaches primarily rely on persdhal observatibh ahd _ ^ 
judgment to make changes in instructional programs. Few 

° teachers use direct and frequent evali/ation strategies 

to decide about changes in students V ihstructiohal plahs . 
' or to decide ^when to reteach or review a skill. 

b. > Teachers who are required to use direct and frequent 
V measurement strategies make more ihstrUctionaT program 

^ changes for student^ than do teachers hot required to 
use the strategies. 

e. Changes made by teachers are variaBle;jjthe most common 
characteristic of changes is the infrequency with 
which they are made* 

d. Training in data evaluation procedures should ihclude a 
focus on appropriate changes to make in instructibh, 
motivation, and physical setting. 

Data Sources: 

• Survey of LD teachers (RR 65, 80) - 

• Comparative study of^ data-uti 1 ization rules (RR 64) 

• Comparative study of teacher goals "(RR 51, 52) " 

Evidence: 

The hatibhal survey of LD teachers revealed that subjective 
teacher judgmehts played a major role in influencing intervention 
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decisions (RR 65, 80). Such factors were cited both in relation to 
initial decisions^ about ^a_ student's program, and in relation to 
.program changes. Only 19^' of the teachers said that changes iti the- 
student^s program would be based oh "objective performance data"' such 
as direct and frequent measurement strategies and-$tandardized tests. 

In a comparative study- special education teachers were trained 
in and required to implement " conti nuous evaluation procedures usin'gX 

_ > _ A 

two data-utilization rules (RR 64). The fir^t rule involved comparing 

student performance to ^a prespecif led goal ; the second' involved a 

general directive to improve continuously upon the student's current 

performance level. The results demonstrated that teachers who used 

either rule made more program changes and more often used student 

_ __ ^ . 

performance data to modify students' programs, than teachers wHo 'did 

riot use -<ariy data-utilization rule. Further, students' reading 

/A 

performance improved more when the ' data-util ization rules were 
implemented by thair teachers than when such rules were not used. 

In another study, the quality an^ quantity of teachers' changes 

were compared for teachers using lorig-Mrm goals and introducing 

_'_ _ _ _ ^ 

program changes at least every two weeksj^and^f or teachers using short- 
term goals and introducing program changes only as frequently as 

^ ----- --- -^x — - - - - - * - - 

necessary to ensure that their students would achieve goals. A 
greater percentage of teachers in the short-term goal group^ade no 
changes in students' reading programs (RR 6?). 'Alheri changes were 
made, all teachers made a greater percentage of changes that were 
characterized a? instructional • as opposed to either motivational or 
physical arrangement changes. Although teachers who set Ibrig^term 
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goals^ade more changes overall, rib differences in reading performance 
wer^ revealed for the students in the two groups (RR 51).: The finding 
that teaehers rarely made changes in students' programs highlighted^ 
the need for more interisive training in data evaluation procedures; 



- . ' ■" Chapter 4 ' 

j' Reading Evaiuatiori " - ' 

This cha^r sommarizes IRLD research Hridings related to read 
evaluation.. T^rjee^ specif ic questions are addressed in this chapte 



1 ng 
r : 



• * "easing? eharaeteristics of a rifcomraended direct measure of 

• How should-, the direct reading measure be' ^^inistered a«d 
scored? ♦ 

" 1° l^^^ ext^t are basal reader criterion-referenced tests * 
techmcany adequate? 

For- each question, the major findings are' sumrnarized and the data 
.soarces^from which the fi^jJings were ^btaj/ied'We listed (generally 
ordered in terrts^ of recency). Specific evidence for the major 
flndjings then is presented.* ■ , ^ 

^- ^^^^ Are-^h^- Charac ter is tics of a Recommended Direct Measure bf 
Reading ? . ' 

Findings: * ; 

: a. A direct measure of reading stiould focus oh the 
^ • • behavior of reading aloud from text. Measures of this 

behavior are technically adequate (valid, reliable and 
sensitive to student growth), have instructional ' 
utility, and are logistically feasible .in the 
classroom. A second choice behavior to measure rs' ' 
reading aloud from word lists. / 

t>. When assessing a student's level of performance the 

difficulty Jevel of the direct reading measure shbuld be 
^ I as close as possible to the age-gra(^ appropriate 

level, without reaching a TeveV so /rustr.atinq that . 
the measure is insensitive to stu^^gnt -grttvftHi ' t 

c. When assessing a student's level of performance, reading 
test Items (text passages or words) should be select&d 
randomly from a mid-sized domain, such af stories or ' ' 
words within one basal reader. 

« 

d. When selecting passages from one basal reader 1t is 
desirable to select several "parallel'^ forms.' 

e- When assessing a student's mastery within progress 
measurement, the reading mastery criterion should be 
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an absolute raw score correct and incorrect eriteribn; 
a recommenrfed Criterion is 70 words correct per 
minute^ with 7 or fewer errors. 



Data Sources: ' . \- i y 

• Norming study (RR 132) - 

• Analysis of readability formulas (RR 1291 ' > 

• Comparative study of standardized and direct measures (RrIpB) 

• Direct measure reliabijity study (RR 109) 

• Implementation study (RR 106) ^" ^ . : 

• Aggregation' study (RR 94) 

• Study of curriculum differences (RR 93) 

) • Comparative study of formative evaluation effects (RR 88) 

• Direct measures norm development (RR 87) 

• Studsy of alternatfve reading performance criteria (RR 59) y 

• Comparative study of three reading placement procedures ^ 
^ (RR_ 56, 57) 



• Comparative study of reading domains (RR SSi^ 

• Lohgi tudinaT study of learning trends on siSlpie measures 
(RR 49) / ■ ^ ■ 

• CompaVative study^of reading domains and durations (RR 48) 
•^Technical characteristics of direct reading measures (RR 20) 

Evidence: ; ' 

The issue of what^pecific behaviors^^to measure when evaluating 
r;eading was addressed by a series-/of studies on the technical 
characteristics of direct reading measures (RR 20). Correlations of 
five direct measures frea'ding alpUd from text, reading aVoud from word 
lists, reading isolated words presented in text; identifying deleted 
words in text, and (j:iving word meanings) with standardized reading 
tests indicated that performance on three of the' direct measures 
(reading aloud from text, reading aVaud from • word lists, and 
identifying deleted words in text) was correlated highly with 
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performance oH^the standardized tests, with the validity coefficients 
ranging between .73 an'd .94. Significant carrelattons were replicated 
in . several other studies (RR 56, 57, 88, 94). Both of the reading 
aloud measures ■ consistently Vcorrelated higher 'wifh t»ie standardized 
tests than did the cloze, measure (identifying - delved wdrdi); 
Corriparisbns of correct •performance on the three^meaSureS across grades 
(£R 20, 87) and across . time Within- grades . (RR 87) revealed that the 
cloze measure was :mucfi leis sensitive to student growth than either of 
th| reading aloud rneasures, and further that the reading aloud from 
text measure was somewhat more sensitive to student growth .than the 
reading al.oud from word lists measure. - Separate analyses, however, 
cbnfirmed that reading aloud from word lists was more sensitive to 
student growth than standardized tests (RR 125). The sensitivity of 
the reading aloud measure across and within grades, and its 
reliability, were confirmed in additional studies with different 
student samples (RR 49, lOS, 109, 132), 

-The., issue of how to select the difficulty level of a reading 
aloud measure was addressed in studies of validity (RR 57) and 
sensitivity to student growth (RR 20, 57). These investigations 
indicated that when correct performai^ce scores were Used, all 
difficulty levels were correlated significantly with achievement test - 
scores (RR 57); however, reading aloud passages of rtiid-rahge 
difficulty maximized slope, indicating greater sensitivity to student 
growth (RR 20, 57), When error performance scores, were used, 
difficulty level affected the size of the correlation of the direct 

measare with achievement test performance (RR 57)'; 

■ ' ( 
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The issue of the appropriate domain from which measdremerit items 
should be selected to assess a student's performance was addressed 
directly with respect to reading aloud from word lists (RR 48), In 
comparisons of measures derived "from a limited (200 words) 

I 

instructional level domain, an entire within-grade level domain, and 
an across grades (preprimer-grade 4]) domain, it was found that as the 
size of the domain increased, sensitivity to student growth decreased. 
However, variability of slope was greatest for measures selected from 
the most limited domain, size; minimal variability is^ desired. 
Analyses of the effect of domain size on the judged effects of 
ihstructidhal interventions did not produce clear results (RR 55). 
Given that it is easier to draw samples of items from a larger domain, 
and that a somewhat restricted domain results in greater sensitivity 
to student growth arid reduted variability, a mid-size domain was 
recommended (RR 48). The widely-accepted procedure of random 
selection from the domain also was recdrnmehded (RR 20). 

The issue of ; the appropriate procedure for selecting reading 
passages was highlighted by a study of the reliability and validity of 
alternative performance criteria (RR 59)'. In this study, reading 
passages were sampled randomly until the readability 1^^ of two 
passages coincided with the mean readabi 1 ity scores for the r^eadirig 
levels. The ntlmber of passages that had to be selected ranged from 5 
to 14; over half of the 19 textbooks sampled required the selection of 
10 or more passages before two representative passages could b^ 
identified. The problem is further complicated by the demonstrated 
inaccuracy of readability formulas (RR 129). First, there, appears to 
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be minimal agreement among several formulas. ^Secdridi the diffieblty 
of a passage also seems to be^inf luenced by the backgrduhd of the 
student reading the passage. ANsug^^ procedure for reducing error 
and increasing technical adequacy is both to create^paral lei forms 'of 
passages by selecting several alternative passages and to administer 
them on consecutive days so that pupils' scores can be aggregated or 
so that administrations can be repeated until results agree oh at 
least two qonsecutive days (RR 59). 

The issue of the appropriate criteria to apply ' to determine 

-' ; . _ _ _ _ _ _ _ 

whether^ a student has achieved mastery of materials was addressed in a 
study that examined seven criteria recommended by various individuals. 
•^When the seven criteria were applied to reading aloud from text scores 
of students, four were found td be sensitive to student growth, to 
demonstrate good criterion validity with standardized tests, and to 
result in at- least 50^ agreement with teacher judgments (RR 57, 93). 
Given that criteria involving the calculation df percentages require 
extra teacher time, ah absolute raw score criterion of 50-70 words 
correct per minute with 7 or fewer errors was recommended. 
Row Should the Birect Read ing Measure b e-^dministered and Scored ? 
Findings: » 

a. The duration of a direct reading measure should be from 
one to three minutes each time it is administered. 

- " - ^ 

b. Reading performance or progress on a directreading 
measure should be scored in terms of the number of words 
read correctly. 

c. Within an evaluation system, the direct reading measure 
should be administered at least two to three times per 
week. 



the determinatidn_df_ whether to measure performance or 
progress should be made in light of individual student 
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and teacher, neerls. Both procedures produce technically 
adequate data. 

Data Sources i * ' 

• Single subject study (RR 120) 

• Direct measure reliability study (RR 109) 

• Direct measures norm derelopmerit (RR 87') 

• Comparative study of data-utilization rules (RR 64) 

• Gomparative study of teacher goals (RR 61, 62) 

Comparative study of three reading placement procedures (RR 57) 

• Teacher efficiency studies (-KR 53) 

Comparative study of reading domains and durations (RR 48) 

• Development of data-utilization systems (RR 23) 
Technical characteristics of direct reading measures (RR 20) 

Evidence: 

The issue of the duration of a direct reading measure was 
addressed in several studies. In studies of the technical 
characteristics Of reading measures (RR 29) and in the development of 
data-_utilization systems (RR 23), a one-minute assessment of reading 
was found to validly index reading proficiency. Although correlations 
between 30-second and 60-secOrid reading aloud trials\were as high as 
^.92 (RR 20), the 3d-second duration was less sensitive to student 
growth and was characterized by greater iritra-indiviSHal variaBility 
(RR 48). Comparisons of 30-secOrid arid 3-minute durations indicated 
that ■ the longer duration resulted In reduced intra-individual 
variability and increased reliability (RR 48). Given the logistical 
benefit of shorter tests weighed agairist the technical and 
instructional superiority of longer tests, the recommendation of a one 
to three miriUte duratiori was made. 
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Several studies p>-dvided evidence- bri the issue of how to score 
perforniance on direct reading measures; they consistently found that 
either correct rate or percentage correct is a more valid score than 
error rate. Studies of the technical adequacy of direct reading 
measures (RR 20) and a reliability study (RR 109) indicated that 
correct performance is a more valid measure of reading performance 
than is error performance. Correct performance scores were found to 
discriminate among reading proficiencies as well as scores reflecting 
a combination of correct and incorrect performance (RR 20). Further; 
correct rate stability coefficients, indicative of a measure's test- 
retest reliability, were higher than error rate stability coefficients 
(RR 87). In addition, validity correlations for error rate were 
unreliable (RR 20). Given that one additional step is required to 
calculate a percentage correct score, it was recommended that correct 
rate be scored. For ihstructibhal information, practitioners might 
want to monitor both correct rate and error rate. 

The issue of the frequency with which the direct reading measure 
should be given in an evaluation system was' addressed indirectly by 
data collected during the development of data-utilization systems (Rr/ 
23).' Students who were measured on a daily basis showed greaMr 
progress than students who were measured on a weekly basis. Daily 
measurement is the ideal; however , teachers find dai ly measurerrvrnt to 
be cumbersome and time consuming (RR 53). In light of this, a 

compromise solution of two to three -times per week is recommended. 

— - * _ _ 

The issue of whether a^ reading evaluation system should use 
progress measurement (in vfhich the measurement domain changes each 
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time a student masters a segment of the curriculum) or performance 
measurement (in which the measurement domain remains the same) was 
examined in several studies. High correlations (concurrent ^^lldity) 
were found for both performance and progress measures of reding; both 
were highly predictive of scores on standardized ^achievement tests (RR 
20, 57). The progress measures studied were based on mastery of books 
within a reading curriculum. When the effect of the measurement 
system (progress • vs. performance) on student .reading achievement was 
exeitnined, no significant differences were found "(RR 61); a^ similar 
finding for spelling perfbrmahee„ was pro^vided by a single-subject 
experiment (RR 120). Rowever, in a^^study of goals and data 
utili,zation^ teachers Using progress measurement were more realistic 
and optimistic about^ their students' programs than were teachers who 

used performance measurement ^RR 62); further, progress measurement 

- ' _ - - 

teachers introduced fewer unnecessary program mbdif icatibhs? Also, in 

a school dT strict where direct measurement procedures were adopted 

district^wide, teachers more often selected progress measurement for 

reading than they selected performance measurement (RR 64). Since 

there \% no evidence of differences in the technical adequacy of the 

two approaches, the decision may be made appropriately on the basis of 

preferences and heeds. 

To What Extent Are Basal Reader £rj-terion-Ref ere Techwically 
Adequate ? 



Findings: 



— Despite the cpntent and face validity of basal reader 
criterion-referenced tests, their technical adequacy is 
often questionable. 



Data ^yrces: : - : . • 

'• Analyses of basal feadgf crUirierh-refef'«nced-^ (RR 113. 
122, 128, 130) ■ ■ ■ ..: ■ V : 

. . ' " ■ . ' ■ . 

Evidence: ; ,. ' ■ ■, 

Analyses of, the technical characteristics dfr seleited .-crtteH^^^ 
referenced tests from' Houghton-Miffl in (RR^ 1,13)^: Girrh^^^':fRR 122),- 
Scott-Foresman (RR 128), and Holt (RR 130) indicated; considerable 
variability in technical adequacy.^ the reliability .and^ ^ 
the Hbughtoh-Mjff Tin ehd-of-level 11 basic reading test were .found .to 
be less than adequate. For the Ginn 720 end-^of-leveV 11 mastery test, 
reliability and validity were acceptable for the cqfnpOsite test 
scores, but variable for the subtests. Reliability and validity';Of 
the ScOtt-Fbresman end-of-book 9 criterion-ref er^ced test appeared 
acceptable for the total test, but* hot for some of the scale scores. 
Analyses of the Holt management program level 13 test indicated that 
the criterion-related validity was acceptable, but that the test- 

retest reliability and the cbhven^^t and discriminant validity were 

_ " - - - -/'^'l _ ^_ 

quest-ionable. It was cbhQlUded that test ebrtsamers must demand 

* empirical validation before relying Oh criterion-referenced test data 

for making instructional decisions. 



Chapter 5 
Spellihg Evaluatibh 
This chapter summarizes ' IRLD research findings *^re1ated to 
spelling evaluation. Tvyo^ specific questions are addressed in this 
chapter: 

- What are the characteristics of a recommended direct measure 
of spelling? 

• How should the direct spelling measure be administered arid 
. scored? 

For each question, the major findings are summarized arid the data 

; ^sources from which the findings were obta.ined are Misted (gerierany 

ordered iri terms of recency). Specific evidence for the major 

fi rid i rigs theri is presented. - ^. 

7 . What Are, the Characteristics of a Recommehded Direct Measure of 
SpelTtng ? , / v 

Findings: ■ ' 

a. A direct measure of spellHrig should focus on the 
behavior of writing words dictated frbm lists. 

' Measures of this behavior are technically adequate '' 
(^alidj reliable* and sensitive to student growth), 
have instructional utility* arid are Ibgistically 
feasible in the classroom. A secdrid choice behavior 
' ■ . . to measure is writing coitipositidris . 

b. When assessing a student's level of ./performance, the. 
difficulty level of the direct spening measure should 
be within one to two grades of ^the student's 
in^tr*tictional -level . ^ : : : _ ' 

Ci When assessing a/student^s level of performance, words 
. included in a dictated spelling list should be 
;• selected randomly from the domain' of words in the 
spelling text or basal reader. 

Data Sources: . 
- Nbrmirig study (RR 132) 
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• Direct measure r'eliabiTity study (RR 1(39) 
. • Direct measures norm development (RR 87) 

• Longitudinal study of learrii'ng trends on simple measures 
RR 49) • 

^ • Development of data-utilization systems (RR 23) 

• TechnicaV characteristics of direct spelling measures (RR 21) 
Evidence: 

the issue, of what specific behaviors to measure when evaluating 
spelling was addressed a series of studies on the technical 
»^ characteristics of direct sfeelling measures (RR 21). rorrelatidns of 
two direct measures (writing; words dictated from lists and writing 
compositions) with standardized spelling tests indicated that 
performance on ; the writing Words dictated from lists measurfe was 



correlatid highly with standardized tests, with the validity 
coefficients ranging between .80 and .96. A moderately Righ 
correlation (.70^ was obtained betwe|^ spelling performance on the 
■ writing compositions measure and performance on a standardized 
spelling test. - . "^st-retest, alternate-form, arid ; inter judge 
reliability levels were high, at least when correct perfbrmariee was 
scored (RR 109). Comparisons of correct performance on the writing 
words dictated f rbm "Is^ts measure across>grades and across time within 
grades revealed that this^easure was sensitive to student growth (RR 
41, 49, 87,132). ' 

TheMssue of the appropriate difficulty level of a measure of 
writing words dictated from lists to assess a student's performance 
level was addressed by a study ^^f the technical characteristics bf 

> 

direct spelling measures (RR 21)t and a study on the development of 
norms for direct measures (RR §7)^ When correct performance scores 
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were Used, list difficdity had little effect on the validity of the 
dictated word list measure (RR El). When, materials were selected from 
material around the student's instrugtional ' grade level, the measure 
was sensitive to student growth (RR 87). ^iven that it is generally 
easier to select words from a grade-level spelling text or from the 
lists of word^ in a basal reader, the recommendation was made that 
words included in a dictated spelling list measure be within one to 
two grades of the student's instructional level. 

The issue of tlhe appropriate domain from which words should be 
selected to. assess a student's performance was addressed by comparing 
student progress when teachers made program changes on. the basis of 
student performance on words from a small domain (a withih-g^de-level 
list of words) and when teachers made changes on the basis of student 
performance on words fr-om a large domain (a iTst of words selected 
from across several grade level s^ (RR 23), Both domains produced ^ 
measures that were sensitive to student growth over time. 
Examinations of the validity of curriculum-based spelling, measures 
when words were selected in three ways (randomly, arbitrarily^ and 
ordered from easy to difficult) indicated that both randomly selected 
words and arbitrarily selected words had high correl ations - wi th 
achievement tests, but ordered words had low concurrent validity (RR 
21). Given the lack of additional research, the widely-accepted 
pi^cedure of random selection from the domain was recommended. 
8. How Should th e Dlrect Spelling Measure be Administered and— Scorad ? 
Findings: 

a. The duration of a direct spelling measure' should be 

from two to three minutes each time it is administered. 
Paced dictation at a rate of 15 seconds per word is ah 
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acceptable procedure. 

; Performance on a direct spell ihg measure should be 

scored in terms of either the number of words spelled 
correctly or the number of letters in correct sequence 
•betters in correct sequence is preferred for 
low-functioning students. 

c. Within an evaluation system, the direct spelling 
measure should be administpred at least two times 
per 'week. 

d. The determination of whether to measure perfdrmance 
or progress should be made on .the basis of individual 
student and teacher needs. The two procedures produce 
similar results. ^ 

Data Sources: , t.^ 

> . - 

• Single subject study (RR 120) 

• Direct measure rei>l lability study (RR 109) 
Direct measures norm development (RR 87) ' 

• Comparative study of data-utilization rules (RR 64) 

• Teacher efficiency studies (RR 53) 

• Development of data-utilization systems (RR 23) 

Technical characteristics of direct spelling measures (RR 21) 

Evidence: ' - 

Intercorrelatibhs among scores, from three test durations (1, 2, 
and 3 minutes) were all high; further, all test" durat ions demonstPSted ' 
acceptable concurrent Validity with standardized achievement tests (RR 
21). Given that limited behavior samples reduce a measure's 
sensitivity to student growth and that low-functioning students will 
write few words during a short duration test, it was recommended that 
the duration of ihe test be /from two to three minutes. Paced 
dictation at a rate of 15 seconds per wbnd was used in* seven different 
studies with demonstrated vaHdity, rgliabilitf, and sensitivity to 
student growth (RR 21, 23, 87, 109,, 120)'. r,i«„ that .the behavior 
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sample from Idw-f Uhctioriihg students probably would be low without 

pacing, paced dictation was recbrmerided; the 15-secohd pacing appeared 

appropriate on the basis of its dabjonstrated technical adequacy.. 

Studies on the issue of hovi to score performance on direct 

spelling measures; consistently found that correct performance /Scores 

were more valid and reliable than error scores '''^RR 21, 109). Both the 

number of words spelVed correctly .and the hunfber of correct letter 

sequences shewed high correlations^ w'i th standardized achievement tests 

(RR 21). Ih^^addition, inter^corer reliability was very high for both 

. _ ' ^-h' ' ■ . ^ 

types of scores (RR 87, l09i^ However, correct letter sequence scores 

were^found to.be more sensitive to student growth than correct word' 
scores (RR 87)"; ' .^^^ ^ , ^ 

The issue of the frequency with which the direct spelling measure 
should be given in ah evaluation systim was addressed by data 
collected during the development of data-utilization s^tem& (RR 23). 
Students who were measured in spelling on a d^y bas^^howed greater 
progress than students who were measured on a weekly basis. Daily 
measurement is the ideal ^irice seven data points are heeded „to make 
program decisions; however, teaches find daily measuremehl to be 
cumbersome and time consuming (RR W). fhUs, a cbmprdmise of at least- 
two times^per week is recdrnrtiended. j' 

* _________ __ ___ ■ • ___ f._ __ _ . _. 

The issue of whether a spelling ^valuation system should use 

progress measurement ' (in which the measurement domain changes each 

time a student misters a segment of the curriculum) or performance 

measurement ( in which, the measurement domain^reniains the same) was 

addressed in a study that compared the effect of the two systems on 
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brie student *s spefllirig performance (RR 120^. No differences were 

found in spelling performance as a^ fanction of the system, f5iven that 

- - _ - - • ' _y^\'^ 

teachers sometimes prefer one system over the other (RR '64)^ it was 

recbmm^rided that the decision be made on the basis of teacher and 

student preferences arid ripeds; 
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Chapter 6 ^ . 
Written Expression Evaluation . ^: ' 

;-- Jhis chapter surmiarizes IRLH research fi act inqs^ related to written 
expression evaluation. Two specific questions are addressed in thts 
chapter: 

What are fhe characteristics of a recommended direct measure of 
written expression? 

How^should the direct written expression measure be 
adimmstered and scored? y 

For. each question, the fSajor findings are summarized and the data 
sources fror^ which the findings were obtained are listed (generally 
ordered in terms of recency^. Specific evidence for the >ajdr 
findings th^h is presented. ; ' 

What Are the Char acter ist ir^-o^^-Recommended- Dire^^ Measure 
Written Expression ? [ I 

Findings: . . 

A direct measure of written expression should focus 
on the behavior of writing compositions in response to 
a verbal stimulus. Certain measures of this behavior 
(total words written^ total words spelled correctly^ or 
letters in'Correct sequence) are technically adequate 
(valid, reliable, and sensitive to studerrt' growth ) , have 
instructional utility, and are logisticalty feasible in 
the classroom. - 

Data Sources : 

^ - :^ y 

• Worming study (RR 132) 

• Comfjarative study of standardized and direct measyres (RR 125) 

• Direct measure reliabi 1 ity study (RR IBS) 

• learning trends on simpte measures 
RR 49) . N : 

' TechnieaTcharacteristics of direct written expressiori 
measures (RR 22) 
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Evidence: ' ; 

A series of. studies demsnstrated that story starters, topic 
sentences, ai^ picture stimuli could be used to collect written 
competitions from students (RR 2^. Rhen compositions obtained from 
these approaches were scored in terms ef total words, words spelled 
-correctly, and correct letter sequences, correlations between - scores 
on the direct measures and standardized acKievement tests were high. 
Internal consistency reliability also was high for all three. 
However, since pictorial stimuli generally are more expensive to 
produce and are less easily incorporated into a response .form,, verbal 
stimuli are -preferred. Both story starters and topic sentences may be 
printed at the top of lined paper to allow students to look at the 
Stimulus as well as listen to» it. 

Comparisons of story starter performance in terms^ of words, 
written and correct letter sequences, across grades and within grades, 
indicated that 'both measures demonstrated adequate sensitivity tb 
student growth (RR 49, 132). Farther, the direct measures of written 
expression were found to be much more sensitive to pupi-1 progress over 
10 weeks than a standardized testi on which virtually no growth was 
evident (RR 126). ' fest-retest, alternate-form, and ; Interaudge 
reliabilities generally were quite high whep correct performance; was 
scored (RR 109), although in some studies reliability has been below 
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10. How^^ould -tha Direct Written Expression r^easure be Administered and 
Scored ? 

Findings: 

a. The durltion of a direct written expression .measure . 
should be three minutes each time it is administered. 



D^^^frormance on a direct written expression measure, 
should be scored- in term$ of eith^ 
^ words or number of correctly spelled words. 

e. Within an evaluation system, two or three writing 
samples should be elicited on each measurement 
occasion. 

Data Sources : • ; 



• Direct measure reliability study (RR 109) 

• Aggregation studyf (RR 94). 

• Direct measures norm development (RR 87) 

• Comparative study of written expression scoring procedures 
(RR 84) 

• Reliability of written expression measures (RR 50) , 

- - __ _ - - - 

• Longitudinal study of learning trends on simple measures 
RR 49) 

• Technical characteristics of direct written expression 
measures fl^R 22) 

Evidence: 

Correlations between performance on the direct written expression 
• measure and a .develbpmehtal sentence score at the end of three, four^ 
and five minutes were all high (RR 22). The three-minufee samples of 

writing, produced the widest range of scores. Use of a three-jnihute 

- - - - - - ' _ J _ _ 

duration in other studies produced data that were very sensitive, to 

student growth acrcrss and within grade levels (RR 22, 87). 

Comparsl9^ns of six scoring prdcedures (mean T-uriit length, mature 

words, total words written, large words, words spelled cdrrec't^, and 
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■corret letter sequences) in terms of validity, reliability, and 
sensitivity to student growth indicated that three (total words 
written, words spelled correctly^ and correct letter sequences) had 
the greatest technical adequacy (RR 22, 50). Scores of mature words,, 
total words .written, words spelled correctly, anM, correct letter '■ 
sequences correlated significantly with standardized written 
expression measures fRR 22) and evidenced good test-retest and 
parallel-form reliability. Discriminative validity with respect to 
grade levels also was demonstrated (RR 49, 87). However, since mature 
w^ds is- more- difficult to scofe^^^^^an^ sequences ^ 

'quite time cohsumihg, scoring either total words. written or number of 
correctly spelled words was recommended. Scoring' ^ of correct 
.performance "^is recommended since the >;e^iability of incorrect 
performance is too low for . it to be used in educational decision 
making^ (RR 109). Inter-judge, a'greegje^j^ in scoring total words 
written, words spelled correctly, and correct letter sequences ♦was 
very high (RR 84). • 

Low test-retest and para 1 lei -form reliability coefficients were 
found for single written expression samples (RR 50). Aggregating 
three writing samples and using the mean sqbre resulted in acceptable 
reliability (RR 94). On this basis, it Was recor^mehded that at least 
two, and preferably three, writing samples Should be elicited on each 
measurement occasion. ; 




GHapter 7 



Oral Language Evaluatidh 



This chapter summarizes IRLD research findings related to oral 



language evaluation. Two specific questions . are addressed in this 



chapter: 



• What are the characteristics of a recommended direct measure 
of oral language? 



• How should the direct oral language measure be administered 
and scored? 




ea?h question, ' the major findings are summarized and the data 



sources from which the findings were obtained are listed (generally 
ordered in terms of recency). Specific evidence for the major 
findings then is presented. 

What Are the £haracterisi:l cs of a Recommended Di rect Measure of Oral 
4an gauge ? 

Findings: 

A direct measure of oral language should focus on the 
behavior of describing a picture stimulus. 

Data Sources: ^ 



An -initial investigation of the relationship between a direct 
measure of oral language and more elaborate, psychometrical ly adequate, 
measures of the quality of language (semantic/syntactic complexity and 
descriptive accuracy scores) indicated • that certain measures of 
children's picture descriptions (number of non-repetitive words) were 
highly correlated with the more elaborate metliods of analyzing 



• Study of expressive language (RR 83) 



Evidence : 
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language samples (RR 83). Concurrent validity of th^- direct oral 
language measure was supported by correlations between .89 and .97 
with tfre semantic/syntactic complexity and descriptive accuracy 
scores. Additional research is needed on other technical 
characteristics (reliability, sensitivity to student growth) of a 
direct oral language measure. 

Shotimte D irec t Oral Language ^teasw^ be Administered and 
Scored ? 

Findings: 

a. Performance on a direct oral language measure should 
be scored in terms of the number of non-repetitive 
words spoken. 

b. The oral language measure should be administered by a 
familiar examiner. 

Data Sources: ^ 

• Study of expressive language (RR 83) 
Evidence: 

When children's oral language samples were scored in terms of the 
number of non^repetiti ve words spoken, hJgh correlations (.89 to .97) 
were found with psychometrical ly adequate and more complicated 
measures of semantic/syntactic complexity and descriptive accuracy 
scores (RR 83). In addition, both the quality and quantity of spoken 
^ language was greater when the tester was familiar rather than 
unfamiliar, suggesting that optimal performance wi 1 1 be obtained by a 
familiar examiner. 



This chapter summarizes IRLD research findings related to 
mathematics evaluatidll. Two specific qUest ibtl's-.^^ addressed in this 
chapter: . 

^ • What are the characteristics of a recdmmehded direct measure. 
" of mathematics? 

• How should ;the direct mathematics measure be administered 
and scored? 

For each question, the major findings are summarized and the data 
sources Prom which the findings were otitained are listed (generally 
ordered in terms of receopy). Specific evidence for the major 
findings then is presented. . / 

What Are- the Characteristics of a Recom mended -Direct Measure of 
Mathematics ? 

Findings: • 

— Preliminary data suggest that, a direct measure of • 
mathematics should focus on the calculation of math 
computation problems. 

Data Sources: 

• Normihg study (RR 

• Direct measure reliability study (RR 109) 
Evidence: 

A study of the test-retest reliability, alternate-form 



rel iabil ity, and inter judge reliabil ity indicated that correct 
performance scores on mosb computation problems was good (RR 109). 
Ihterjudge rel i abi 1 i ty was very high across all types of problems (*90 
to .9^9) and test-retest reliability was good (.78 .to .93)^ but 
alternate-form rel i abi 1 ity wa^ ^nly moderate on addition^ subtraction^ 



and muHipllcationn.Sf b .72) and low on division (.4a)/Mn a local 
horming . study, the alternate-form eorre.lation was low for bbth. 
muUiplieation (.61) and division (.48) (RR 132). Although math 

, measures used in the Vocal norming stqdy showed grade level; 
differences, they did. not always reflect higher performance by older 
students. However, the measurement task in that study did vary for . 

. different grades in some cases. Additional research is needed oh 

sensitivity to student gj^owth and other technical characteristics 

(e.g., validity) of a direct mathematics measure. Such research may 

lead to refinement of the recommended direct measure of mathepiatics". 

How Should the Oireet-Mat hematl^. Measure be Administered ;^nd Scored ? 
Findings: ■ ■ " , 



a, 



The types of problems presented to a student maybe 
: determined by the grade level of the student or may 

sample from all types of math functions. 

b. Performance bri a direct mathematicsjjeasure should be 
scored in terms of the number of digits correct. 

c. Within an evaluation system, several samples should • i 
be elicited on each measurement occasion. 

Data Sources : ' , : 

_ • Direct measure reliability study (RR 199) 
Evidence: 

When students" in grades 4 and 5 were- tiHed on' math problems 
Ifmited according, to their grade level, tfibst reliability coefficients 
were ;in ah acceplfcle range (RR 109). Only interjudge reliability 
(.93) and test-retest reliabil ity(. 93) were calculated for a single 
measure that included all math functions. Additional data are, needed 
befor^ a specific recommendation can be made as to the sQope of 
problems included in^ direct measure of mathematics. 
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42 . . ' / 

Reliability data clearly indicated /that correct performanee oh 

_ _ J - - _ 

math problems should be scored (RR 109). While correct performanee 
scores produced good to high re^ i abi 1 ity coefficients, incorrect 
performance scores often produced very low reliability coefficients 
(e . g . i . 09 ) . The correct performance scores were ' ca 1 cu 1 a ted by 
' count ing^ the number of digi ts :cdrrect ; a digit was cdhsidered correct 
'if it appeared in the correct place within the answer. 

A'lternate-form re 11 abi 1 ity coefficients for direct mathematics 
• measures sometimes were lower than desirable (e.g. ^ division - .48) ^ 
suggesting that several alternate forms should be administered on each 
testing occasion (RR 109). The student's score would be an average of 

- thfe scores on the repeated administrations. 

\ 
\ 



ehapter 9 . * 
^ Social Adjustment Evaluation ♦ 

This chapter = summarizes IRtB research findings related to 
' mathematics evaluation. Two specific questions are addressed in this 
chapter: . . * / . . 

* What- are- the characteristics of a recommended djrect measure 
of social adjustment? 

: ' RQw_shoald the direct social adjustment measure be 
admimstered and scored? . 

— ' - ^ < " ■ " — ^ 

For each question, the major findirigs' are sammarized and the data 
sources from which the findings were obtained are listed (generany 
prdered,;in terms of recency). Specific evidence for the major 
findings then is /^resented. 

^ ^ • What Are the Characte r istjcs^-^Recommended Direct Measdre of SocUl- 
- Adjastm ^nt? • 

Findings: / ; ' 

' ; — A direct measure of social adjustment should focus on 
general classroom conduct and social interaction. The 
specific behaviors .should be identified within the 
' I , specific setting of interest, ^ 

5ata Source^: , - 

. • 1^^^^^^ influencing direct 'spcial adjustment 

measures ,.(RR Sf) 

• Technical characteristics of direct social adiustment 
measures (RR 24) • 

, • Measuring classroom behavior (RR 6) ' v, . 

Evidence: \ I 

( OK V 

Observational studies of behavioi-s\that index social adjustment 
. 'indicated that the specific , behav.iors associated with social 
functioning variables vary with the. specific setting, and tb some , : 



ERIC 



44 

extent with the sex of the student (RR 24; 82). An initial study 
revealed that the degree of discrepancy between the rate of a tirget 
student and his or her peers on several specific measures {noise, but 
of place^ physical contact or destruction, off task) agreed with 
teachers' identifications of prbblem students (RR 5). Another study 
suggested that either the frequency of occurrence of peers talking 
with the target child or the number of different peers^ talking witfr^ 
the target child was a valid indicator of social status (RR 24). In a 
third study, only the frequency of occurrence of peers talking with 
the target child reliably correlated with social status (RR 24). 

An extensive study of behaviors that correlated with social 
functioning (both social status and teacher-perceived behavior 
problems) suggested that peer behavior toward the target student 
correlated with social status and the target student's behavior (e.g., 
aggression) correlated with teacher ratings (RR 82).. However, in 
this study, differences in the patterns of correlations existed 
between boys and girls across settings. Peer approaches, related 
consistently to social status for boys in both structured academic and 
unstructured non-academic settings, but only in" an academic setting 
for girls. PrdBleiti behaviors clearly related to teacher ratings of 
girls in academic settings^ but aggression was the consistent 
predictor of teacher -ratings^* boys in the same settings. Secause of 

the pervasive influence of setting, it was recommended that data be 

..^ ---^ -- • - - - - 

collected on the-^arget student ' s' general classroom ^conduct and social 

interaction and on a classmate *s general classroom conduct and social 

interactibh; * * . 



At this poini, it appears that the rate of studeht initiations is 
a pHme behavior to mdriitof. A discrepancy between the two students 
woald provide a basis for rriohi tor irig the target- student's social 
adjastment ; 

1 ^r. How Shoal d- the-Oij^ec^ Social Adjustment Measure be Ministered^and^ 
Scored ? ' 

Findings: ' 

a. Administration of the direct sdcial^adjUstment measure 

: could involve observation of the target student and 

classmates oh an interval-sampling schedule.. 
_ « * 

b. Performance eoald be scored by tal lying- occurrences of 
the target behaviors. 

Data Sources: 

• Study of variables influencing direct social -adjustment 
measures (RR 82) 

• Technical characteristics of direct sbcial adjustrrient 
' measures (RR 24) 

Evider>cer 

' _ > - i- 

During investigations of the technical adequacy of various 

measures of social adjustment^ a 60-secbhd observation interval 'was' 

used to collect data on the occurrence of. specific behaviors (RR 24, 

82). This schedule could be applied 'to a situation where the target 

student and one classmate would be observed during alternate intervals 

to obtain a measure of target student discrepancy from a peer. Suring 

each observation inter\;al. Behavior was coded simultaneously on 

different categories; ±wo behaviors within a categoP-y were coded 

during one interval if a 5-seeond break clearly ^-separated the 

' behaviors. Additional research 'is needed to establish' the logistical 

feasibility of this procedure arid its utility for interventions. 



Ehapter 10 ^ 
Bata Util izatibn " 
This chapter summarizes IRtB research findings related to the use 
of. data collected on students -to make decisions regarding pupil 
progress and program success. Pour specific questions are addressed 
in this chapter: ■ 

What are recommehded procedures for graphing data? 
• How should graphed data be used to evaluate students' programs' 

Row should teachers be trained to use data for judging 
intervention effectiveness and improving student performance? 

To what extant does measurement and data utilization by 
teachers affect students' learning? 

For each question, the major findings are summarized and the data 

sources from which the findings were obtained are listed (generally 

ordered in terms of recency). Specific' evidence for the majoj; 

findings then is presented. 

What are Recommended Proceduris- J^o ^raphing Data ? 
Findings: ' 

a.' Correct performance should be graphed. Incorrect 
performance may also be graphed along with correct 
performance to provide information about accuracy of 
performance. 

„- b. When graphing a student's level of performance, pquai 
interval graph paper should be u^ed rather than 
semi-logarithmic Chart paper. 

c; When graphing a student's reading or spelling progress 
through a curriculum^ number *bf words spelled or pages 
read should be spaced along the ordinate axis according 
to the time of mastery expected of average students in 
the curriculum. 

d. Students may be taught to ""chart their own performance to" 
increase teacher efficiency and facilitate student 
satisfaction; 
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Data Sbtirees: 



Comparison of student self -management teeh^iqaes (RR lis) 
-^r • biVect measure, reliability study (rr 109) 

• Comparative study of graph papers (RR 101} 

• Aggregation study (RR 94). , . . 

• Direct measures norm development (RR 87) " 

' (RnSf '''^ ^^"^^ of written expression scoring procedures? 
' ^^^Pf^l'^^'^^ of three. readi-ng placement procedures. 

Reliability of written expression measures (RR 50) 

• Technicarcharacter-istics- of direct written expression 
measures (RR 22) * 

• Technical characteristics Of direct spelling measures (RR 21) 
Technical characteristics .of direct reading measures (RR 20) 

Evidence: 

Studies in the area of readings} spelling, and written expression 
consistently have indicated that 'correct performance h^ greater 
technical adequacy than does incorrect performance (RR 20, 21, 22, 50,' 
57, 84, 87, 94, 109). Thus, graphing of data should focus on correct 
gerf ormahce, ' " 

Studies of graphing procedures within performance measurement 
have examined, the relative merits of ^^'equal interval and semi- 
logarithmic graph paper (RR 101). Analyses of deviations between- 
actual scores and scores predicted from graphs on each type%.6f paper 
indicated that predictions were more accurate when data had been 
graphed on equal interval paper. . * 

Within progress mea-surement, a Critical problem is the lack of 
equal intervals from one curriculum un.i.t,to the next. It appears that 
the technical adequacy of progress measurertie/it might be improved if 
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the system were conceptualized as progress through paqes read or words 
spelled of a eurriculum, with the number of pages' dr words spaced 
along the ordinate axis according to the time of master'y expected of 
average students in the curriculum. Research should be conducted to 
examine the validity of this assumption. 

Although having students graph their own performance data does 
not necessarily result in increased student achievement (RR 115), it 

• m facilitate increased student satisfaction and reduce teacher time 
for evaluation activities. Evidence: : suggests that by increasing 
student responsibility in charting tasks, increased student 
achievement also cah be attained,. 

18. How Should Graphed Data be IJsed to Evaluate Students* Prbgrams ? 
Fihdirrg^: 

a; Graphed data should be summarized arid interpreted_tb - 
determine whether the instructional program is effective 
or needs; to be changed. 

b. Goal-oriented analysis, i$ preferred for 'mdrii tor ing 
progress^toward lEP goal obtaining Information about 
when to change a student^s instructionarjjrogV'am^ and 
explaining student progress to parents ^fid other 
teachers.. ^ -* 

c. Pi^dgram-oriented analysis is preferred for obtaining 
information about what to change in a student^'s 
instruct ional program, 

d. A combined goal-oriented and .prograff-oriented procedure 
that is recommended involves drawing a trend line,, 
through 7 to li3 data points; if the trend is flatter 
than the goal line, a program modification should be 
jjritrbduced. 

e. Data bbtai^ried from several students can be used to make, 
decisions regarding general program components. 

Data Sources: . , . : 

Analysis of statistical properties of data (RR 138) 
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Nbrmirig sttidy (RR 132) 
» Evaluation of program effectiveness fRR 123) 

• Assessment of alternative ;data sumiarj? procedures (RR H2, H8) 

• Comparative study of data utilization rules (RR 64) 

• Comparative study of teacher goals (RR 61, 62) ' 
' Analysis of program components (RR 12) 

• Demonstration study of data utilization (RR 10) '; . 
,. tvidence: •. ' ' 

When teachers summarize student data arid implimint data- 
utilization rules, student performance increases more than when data-: 
Utilization does not occur (RR 10) or- when constant efforts aremadeV 
to improve the student's perform^ance without data-utilization 
procedures (RR 54). ^ 

" _ "_ _____ ' 

Two basic procedures may be used to summarize student data- 
visual analysis or statistical analysis. One investigation of visual 
analysis (RR 112) revealed that it is not v^ry reliable for evaluating 
educational prograrps, and that it is influenced considerably by the 
characteristics of thg data array (e.g., slope and variability). 
Another study (RR 118) indicated that jhe relationship between results 
of ■^visual analysis and statistical ari^liysis procedures was modest at 
best. In other words, many Hqterveritibns were judged v.isually to be^ 
significant in their effects when they were not statistically 
significant, and vice versa. Further research has suggested that ; 
specific factors c^an<'affe?t'pe awuracy o^ y1§Ual analysis (RR 125), ' 
Given that training in statistical analysis and accgss to statistical 
programs is limited for most teachers, it is likely that graphed data 
qehnrally will be analyzed visually. - Thus, training becomes- 
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especially impbrtarit. Initial research has indicated that training 
can increase, the accuracy, of vistial analysis juagments (RR 118). 
Additional research is needed to identify specific training components 
that can increase even further the aceuraey of visual analysis for 
summarizing student data. For example^ an analysis of statistical 
properties of data has suggested that some training should focus oh 
the interactions among time-series characteristics when making 
Judgments based on visual inference (RR 138). 

In analyzing graphed data^ teachers who Used both gbal-or iented 
procedures (set the goal and a data on vvhich it is to be reached, draw 
a goal line, and then compare student perf drmahde trends to the goal 
line): and program-oriented procedures (test student performance 
frequently and change -program when it appears needed or after a 
specified number of tests, usually 7-10) reported that they preferred 
the gbal-briehted approach ^or (a) monitoring student progress toward 
lEP gbals, (b) obtaining informatioh about when to change a Student's ^ 
iristrUctiqnal program, and (c) explaining student progress to parents ' 
and teacffgrs (RR 64). They also indicated that theV" goal -or iented 
approach was' easigr tb use, : mbre efficient, and a* more afturate 
representation of student perfbrmahce. The program-oriented approach 
was preferred only as a guide for what tb change in a student *s 
instructional program. , ^ ■ 

Teachers also were more accurate in summarizing data when using 
goa1-oriented procedures (47% correct surnmarizatibhs) than when using 
program-oriented procedures (12% correct surttllari zatibns ) (RR 64); 
Further, the timing of changes in students' programs was more accurate 
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in goal-oriented analysis {70%) than in program-oriented analysis 
„(33^). in another study, teachers who used gbal-oriented anaj^sis 
made more correct decisions about whether to change a student's 
program (79%) than teachers who used program-oriented analysis {68%) 
(RR 61). Teachers also Judged effective interventions more accurately 
when they applied gbal-oriented procedures (100?^) than when they 
applied program-oriented procedures {80%) (RR 61). Further, teachers 
believed they were more effective wHen using a goal-bfiented approach^ 
than when using a program-oriented approach, even though there 
actually were no student performance differences (RR 61^ 62). 

Program component research is a viable outcome of data collected 
through direct -and frequent measurement procedures. An illustration 
of this approach in a Child Service Demonstration Center for Children 
with Learning Disabilities (RR,12) indicated that it could provide 
immediate payoff for decision makers and could be used to identify 
effective intervention variables within a program. System^^^dedata 
collection alsohas revealed that the data-based assessment approach 
offers not only a measurement alternative for the~ student,' but a 
comprehensive reviewing procedure that is sensitive tb the needs and 
problems of the schbbl system as a whole (RR 123). tbcal normative 
day can^be obtained by sampling as few as 20 students per grade, wifH 
the result provi'ding a median for normative comparisons that o-|.j£ery 
close to that of the entire class or- to an iestimated true mediah''(RR 
132). ; 
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How Should teachers Be Trained to Use Uat^ for Judging Intervehtibn 
Eff eeti veness and Improviag S^tuderit Performance ? 

Findings: 

a. Direct inservice or worksho'p^training, rather] than S:glf__ 
instruct ioni is recommended for training teachers to.cbl]ect 
data frequently and to use the data to make instructional' 
decisions. . > 

b. Systemaitic procedural changes can increas teachers* 
efficiency in using direct and frequent measurement 
procedures. 

c. Direct training of teachers in jpeasurement activities 
IS more likely to result in teacher use and efficiency 
than training through manuals alone, 

d. Goal setting is integral to progress measurement ^ 
activities; teachers should monitor student performance 
in relation to short-term objectives rather than 
long-term goals. 

e. Direct and frequent measurement with curricdlum-based 
tests can increase the reliability of scores and may 
provide the best measure for determining reading 
placement. 



Data Sources: 

• Experimental ^study of formative evaluation effects (RR 88, 
95, 97, 111, 115) 

- Comparative study of data-utilization rules (RR 64) 

- Study of self-instructional training (RR 63) 

• Comparative study of teacher goals (RR 61, 62) 

- Comparative study of three reading placement procedures (RR 56) 

- Teacher efficiency studies (RR 53) ' 

- Interviews of special educators (RR 41) 

• Development of data utilization systems ^(RR 23) 
Evidence: : . . . 

*An early study of data utilizatfdh provided participating 



teachers only with mi nimal training {IH hrs ) in data arialys(^s for 



53 

making program decisions (RR 23); Resdlts indicated that, in general , 
teacher use of decision rules was more effective than teacher judgment 
in improving student pSrf Orirjahce . These tenaoas findings suggested 
that a fruitful areai, of, researchi with respect to the development of 
an effective evaluation system^ would be to test alternative data 
atilization approaches that involve more intensive teacher training in 
monitoring and evaluating student progress. In fact, when interviewed 
one year after the study, several teachers indicated the need for more 
intensive and relevant training, including modeling of the procedares 
(RR 41). ^ 

Teachers have been trained via (a) a week-long workshop prior to 
_ start-up in the fall and semi -weekly workshops throughout the sc^bi 
year (RR-^3), (b) a self-instruction^ manual plus four workshops fRR 
64), andUc) training of district personnel who in turn directly train 
teachers Ustng the seTf-instractional manual (RR 88, 96^ 97). 
Regardless of training procedure, teachers have had difficulty using 
the data systematically. However, direct training of teachers was 
more effective in promoting efficiency in measurement tasks (RP 53). 

__ _ _ _. > 

For this reason, direct training is recommended . ligcreased attention 

^ • 

to data otilization during training Is heeded. .Impre^se teacher 

implementation of measurement and dec]sidn_^rules has been observed 
even in controlled studies (RR 61, 88, 111, 115), thus emphasizing the 
need for bri-gbipg training in measurement, graph interpfetatibn, and 
data Utilization pr^a|f|ures. -■ 

A series bf studies on teacher efficiency in employing direct arid, 
frequent measuremerit strategies indicated that teachers ^nitiall; 
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required 13^ minutes to prepare, administer^' score, and graph 
measurement tasks on four academic behaviors for one student; the 
number of students on teachers' caselba^Js made th^i^|[ie commitment 
^burdensome (RR 53). In addition, th-is time cbmrpitment did /lot include 
the time needed to read and analyze graphs, important tasks if the 
data are to be employed meaningfully ^ to improve student progress. 
Procedural changes, such as administration of reading and spell inq 
tasks prior to the written expression task, and measurement at the 
beginning of the period, resulted in greater teacher satisfaction. 
Other factors suggested -as effective in increasing teacher efficiency 
were precdUhtihg the words in oral reffdin| passages, group 
admjnistratldh of tasks, the use of" mechanical devices to administer 
the measures, and student graphing of measurement results, ^^Direct 
^training of teachers in efficient ' procedures was more effective than 
training via a self-ihstructibhal manual and periodic ins^jt^vices. 
Teachers appeared to need prbrtiptihg tb imprbve their efficiency with 
direct and frequent measurement strategies. 

In a comparative study, special educatibh teachers who had set 
long-term goals for studi^nts, graphed students' word recognition 
performance, and made a program adjustment every twb weeks, predicted 
. that students would master a greater number of wbrds tha^jd teachers 
whb set shbrt-teriT?4b.iectives and compared graphed studeht/perfbrmanee 
with a shbrt-term itmlihe (RR 62). The predictions of ^ the teachers in 
the short-term ^als grbup were more accurate. ' However, no actual 
differences were fbuhd in student progress (RR 61). Imprecise teacher 
implementation of measurement and designated decision rules was 
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observed, suggesting 'that dn-^gdirlg tr'airiiihg of teachers to- measure, 
interpret gPaph^ ^rnectty, and use student data consistently are 
critical. For ex^le, teachers who were to measUre sVudents daily 

actually -m^ured only three times per week. And, teachers in tb^ 
- - . - . _ _ _ _ ' I '""^ 

long-term goal setting group who were to measure daily only correctly 

employed their decision rules 56^ of the tiirje;-" teachers who m^sured 

weekly did so 785^ of the tirae. Teachers in the short-term goal 

. setting grOup appropriately moved to new reading lists Only twO-thirds 

Of measuremerit days. Qbviously, both the use of data-utilization 

rules and specific training on the rules is an esseiitial dimension of 

_ -- _ 

a measuremeht system effective in improving student achievement, 

Iri a comparative stUdy of three procedures for placing students 

in reading curricula (teacher judgment, standardized testing, 
curriculum-based assessment), correlations amohg^he three placement ^ 
procedures were high but the agreement among scores frqm "the three 
measures was not (RR 56). Placement from curricUlum-based measures 
agreed better with students' actual reading placements than did norm- 
referenced test scores. Achievement test scores and currfeolum-b^sed 
placement scores agreed for only about one-half of the students," tt 
is proposed that direct and frequent measurement strategies provide a 
resolution to this problem. Since curriculum-based measures can be 
Used with any curriculum, and a student's score is calculated as the 
median of scones oh repeated satnplihgs, measurement error' may be 
reduced, resulting in improved accuracy of curriculum^based tests for 
reading placemehti [ - ■ 
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20- To What Ex tent Do Weasorementl and Data l ltnization by TeacHers Affect 
Students ' Learn Ihq t , . 

Findings: ^ 

a. Student perf brmanee, Incr^aps more when^t use 
specific data-iitilizatiorf rules to monitor progress than 
when they, rely on their own judgment, about student 
progress. . 

' , _____ ■ ;. ■ c 

b, Tbe.guality of instrtictiQn improves when teachers Use 
direct and frequent measurement and evaluation. 
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c. students' knowledge atlbut their goals and pro^res? is 
greater when, teachers -^empVoy 'dtrect and frequent ' 
measurement and, evaVuatibh . ' . , ■ . 

_ ' , : ' : r^^ f : ■ ■ '^^ ' .. - ' I- 

d. Measurement appears 'tis/bje a heGessary condition in ■ 
producdng student gryQwtfi^ but hot a sufficierit one; 

. gositiv>e Effects .of measureme clhhbt be sustaihed 
-unless oata-utilization "procedures alsb are used. 

Data Sources: . ' ' 

• Gbmparisoh of student self ^management techniques (RR .115) ; 
4 • Surveys of experimental study participants (114, 124) 

• Experimental study of formative eva^luation effects fRR 88 ■ ' 
95. 97. 111. 116) ^ ' 

Instru'Hional rating scale"" validatibn (RR 107) 

Implemenrati on study (RR 106) ■ 

Causal motfel analysis (RR 105) ' 

Comparative study bf data utilization rules (RR 64) 

Analysis of program components (RR 12) ' , 

• Demonstratibn study bf dat^ utilization (RR 10) 

Evidence: 

in a cbmparative study of data-uti 1 izatibn rUles^ student reading 
performance^ increased, more wRen. specific data-uti 1 izatibn strategies 
were Used than when teachers were making a Cbnstant effbrt to improve 
Upbn the students' current performance levels withblit data-utilization 
procedures (RR 54). Further analyses indicated that time or 
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itiatur^ibn alone did not exp,lain the increase in student performance 
fra"i_ t^^e hb. data-otilizati5n phase .to the' first data-otil ization 
phase. ; ; • * . ■ . 

In a deiTibristratiori study of thi implementatibri of data- » 
Utilization techhicfues, students exhibited greater reading achievement 
when rule§ fbr the utilization of measurernent data were iric Hided as 
.--jj'art of the fbrmative evaluation' system (RR 10). When teachers 
measured student reading perforniance daily relation tb daily goals ; 
and I altered both gbals arid' eonsequences contingent upon measured r , 
stad\nt performance relative tb gbals, superior 'achievement occurred. 
In another study, a significantly higher prq^rtion of elementary 
students attained mastery moje rapidly WfTen daily performance Was 
graphed .than whea it was not graphed (RR 12), Variations in 'the 
implementation of direct and frequerit evaluation procedures' als'o' 
appear to influence studen| achievement. ; Fbr example, in experimental .. 
cbhditibns where stadents^?^^1ected -their instrUctibnai activities ahd ^^^ 
then charted their own performance, significant increases in student i 
achievement occurred 'on Both direct and ^standardized measures (RR 

115). ■ . ' i \ - 

A series of implementation studies indic^ated that the extent tb 
which a formative |/alUation s%t^ is implemented may determine the ■ ' 
extent to which effects are seen in terms of instructional structure , 
or student achievement ^R 88, 111, 116). ^The observational scale ^ 
used to assess instructional structure in these studies was determined '. 
to be a Hsefu^h research tool from the -standpoint of technical adequacy * 
arid heuristics (RR 107). in the studies, it also was found that th#-^ 
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lack' of communication among special educators, regular eddcatS^i 
administrators, -ind K^arents might be reduced through the use ^pf 
formative evaluatior^rocedures (RR 114). There was ^some indication 
that, even with minimal implementatidn of the formative evaluation 
system, V^tudents wer^e more aware of working toward a goal and were 
more optimistic about their progress; teachers ^Isd seemed better able 
to realistically jucjge their students' progress (RR 124). 

/A causal model analysis /was conducte'd on the relationships' ^ambhg 
the degree of implementatioh of the formative evaluation system^ the 
amount of structure in the ^ students' reading inltr actional program^ 
arid the students* rate of academic progress over one year (RR 105). 
Causal mbde-lihg techniques allow Inferences '^fo be. made about the logic' 
■ '^(itij^^ctidnal hypotheses for .obtained correlations. Teacher 



implementatidn df measurement procedures, student achievement, and 
degree df teaching structurl were f blind to-be stable, over time (e.q.t 
if a teacher' designed a hijjily structured program for a student, that 
student continued to receive hfghly' structured instruction throughout 
the school year>.. While measurement had a strbrig effect on structure 
and achievement,, these effects wer? short-lived and not evi.deht kt the 
^^end of the study. Specifically, silent . reao^ng practice related tb.. 
reading, achievement gains and the routine|^ of measuring . studehtv 
progress inf.luenced^ struci^re; however; *> the hypothesis that 
measUf-ement would result in increased . structure and student^ 
achievement was unsupported. It appeared that measurement activities 
were important initiaHy in the implementation of data-based 
mbdif icatibh, but that student achievement gains could.be sustained 
only if evaluatibn bf data bccurred. ' ^ 
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Ahbther stddy 5f the effec^of Hirect arid frequent meaiuremsnt 
and evaluation on s^J^ts* reading achieveinent, teaehers'" quality of 
instruction, and students' knowledge about their own goals and 
progress, strongly supported the dse of. direct and frequent 
measurementf^and evaluation (RR 96,^,97); ,In cbmparisbn to 21 teachers 
Who used typical special education e^^luation prdcedUmp-j. 18 New York 
; City special education teacRers .who ei^^ed frequent cu»^ridul urn-based 
• measurement and evaluation procedure^' (a) affected , greater student 
reading achievement-, (b) delivered ^dJe\truclured reading, lessons ' 
and (c) were more successful .in eoftimtinicafing accurate information to 
their pupils concerning" student goals and progress. 




An impl.ementation study also confirmed that' teacrteVr^eral ly • , 
- ----- ' - ■ % 

found that the data obtained from a->«data rao;nitoring sj-stem inVeadfng ' 

were useful for tracking student progress fRf^ 106). Some of the 

teachers in this study reported that 'the system, wa^ helpful in 

communicating with parents and teachers. • ' • 
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Table 1 

Evaluation Research Data Sources 



Data Source 



Research 
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Quest ions 
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4. 


5. 7. 8, 9. 10. 13. 14. 17 


64 
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5. 8. 18, 19. 20 t 


87 


4, 


5. 7. 8. 10. 17 


132 


4. 


7. 9, 13. 14. 18 
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5. 18. 19 


56. 57 


4. 


5. 17. 19 


49 


4. 


7. 9. 10 


23 


5. 


7. 8. 19 


88,- 96.- 97 . 
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20 


4. 
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10. 17 
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4. 
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15. 


12 

16 . > ; 
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15. 


16 


115 
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20 
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18. 


20 


12 


18. 


20 
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67 


2 




55 


4 




59 


4 




93 

129 ' 


4 
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113, 122, 128, 
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Direct measure reliability study 

Comparative study of data utilization rules 

Direct measures norm development 

Norming study j 

Comparative study of teacher goals L> 

Cbrriparative study of three reading placement procedures 

Longitudinal study of learning trends on simple measures 

Oevelbpmerit of data utilization systems 

Experimental study of formative evaluatibri effects 



Technical characteristics of direct reading measures 

Aggregati^'n study 

, Teacher efficiency studies 

Technical characteristics of direct spelling measures 

T^chnicai characteristics of direct written expression 
^measures 

Survey of LD teachers 

Interviews of special educators 

Surveys oiF experimental study participants : 

. ^ — » — • . . . . - • . - - - ' - - - - 

Comparative study of read i ng ^dOftiai ns ar(d durations 

Comparative study of standardized arid direct measures , 

implementation study 

Single subject study 

Reliability of written expression measures ^ 

Comparative study of written expression scoring 
p^rocedures . ' 



Study of expressive language 

technical characteristics of direct social adjustment 
measures 

Study of variables irifluerrc^ng social adjustment 
measure^ 

Cbmparisbri of student sel f-mariagement techniques 
Demons tf a tibri study bf data utilizatibn 
Analysis of program components 
Survey and observation of special ed teachers 
Surveys of special educators 
Comparative study of reading domains 
Study of alternative reading performance criteria 
Study bf curriculum differences 
Ahal^is bf readability fbrniulas 
— Ar>Sj|^s of basal reader cf iter ibri«referehced t&Sts 



Measuring classroom behavior 
Compfcrative study of graph papers 
Assessment o r a1 ternat i ve data summa^ry procedures 
Evaluation of program effectiveness 
Analyses of statistical properties of data 
Study of sel f- instruct ibhal traihirig 
Causal model analysis 
Instructional rating scale validation 
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112. 


118 


18 


123 




18 


125. 


138 


18 


53 




19 


105 




; 20^ 


107 
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Data Sources 

- i,* .1 - - ■ X ' 

w ■ _ '\_ • _ ■ 

Jhis chapter provi.cje,S a summary of the data sources and research 
procedures used to obtain the research firidihgs jjresehted in the 
previous chapters. An overview of the data sources is provided in 
Table 1. The IRLD research reports in which more detailed 
explanations may be found are listed in the table^ as are the numbers 
*jjpSf the corresponding research questions*, the data sources are ordered 
' within this chapter (and the table) according to the .frequency With 
which they are cited as- evidence for various research qgestions. 
' . Direct Measure, Rellabi 1 ity Study (RR 109)' ^ 

Two separate investTgations were conducted to examine the test- 
retest reliability, alternate-form reliability, and intei^judge 

reliability for direct and repeated measures in the aireas |of reading, 

_ _ - - ' _ ^ J 

spelling, written expression, and math, ( : ' 

- - . ; - _ 

■In study I (1979-80), a sample of 5B5. students (275 males) 
■ enrolled in grades 1-6 from three states was administered direct 
measures of reading ffe pfte-mihUte tests) . spelj i,ng (2 1t^^^ 

...... ■ _ _ /..i.^; ^tv t 

tests)/ and writterl^xpressibn (2 tfire§-mi nUte ie^ 

were selected randomly from the school district^ .thjit vQluh.te^red"'%feT' 

■_' _ " _ ■ r- ; ^_■ \_ ' _ - ' ■ '-^'J^ 

participate in the study.'-^ The students were approximately equally 
distributed ^ among grades 1-6. Each student .Was _ administered the 
measures during late fall and again during early spring on an 
individual basis by a trained examiner. 

In study II (1^1-82), 76 students randomly sampled' from grades 4 
and 5 were subjects In a math rejiabil ity investf gation. Thirty 
students in grade 5 were invo^d in thfe test-retest re friability 
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investigation; the 46 student? ^ in grade 4 were, involved in the 
;.:anernate-form reliability investigati ph., Measurement materials in 
math inci tided ^computation problaftis printed on forms. All testing was 
group administered, with 10 students tested at a time and a one-week^- ' 
interval separating the two testing periods. All testing and scoring 
was done by trained ^ucational aides. The .number of digits correct - 
and incorrect was computed for each math function. 
Comparative Study of Bata Utilization Rules (RR "6^^)" 

ten* special education teachers in a midwestern rural educational 
cooperative implemented direct and frequent, measures and data" 
utilization procedures^ wit|j at laast two students each over the 
1980-81 school year. Teaching experience ranged from 0 to 10 years, 8 
teachers were female. Students in the study were functioning A 
dramatically below the^ir peers in academic, language/ and/or social ■ - 
areas. 

The teachers were trained to implement frequent, ^rrteasuririient 
systems during one week of full-day workshops prior to the school 
year, and in half-day sessions -perilSically, throughout the school , 
year. By February 1981, each teacher was; measuring and graphing the ' ^' • 
students' reading performance at least three times per week. At this 
time,':twp data utilization systems, experimental and therapeutic 
arialys^^V^^ the teachers. In therapeutic data 

analysis, 't>w.^acher ' s objective was tb insure that a student's 
performance reache^^-^a^^specif ied goal by a eertaia date. In 
experimental data analysis, 'no student performance level and 
attainment date were specified; rather, the teacher's objective was-to ' 
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improve continuously upon a student's etirrTnt performance level by 
i.ritrodactng and evaluating a series of Ufiending program changes. * One 
half of the teachers implemented experimental teaching and the other' 
half implemented therapeutic teaching; after nine weeks pf data^ 
collection the teachers switched systems. 
. : ; T^ree data utilization strategies fnb : data • utilization, 
therapeutic, experimental) were compared in terms qf their effects on 
the number of modifications teachers ' made. Every two weelciS, IRLD 
.staff inspected each student's graph and ^ counted the number of 
instructional changes made. To assess the eff^t .of the data" 
uti.lizatiori strategies on pudent performance ^ every two weeks 
teachers measured "the students? oral reading rate correfct on a random 
list of K-3 words. At' the end of it^e school year, teachers Completed 
surveys regard i hg 't)ie if ^referenqesj ' for different me'asijremerit 

strategies. ; -i . ' ' V --t^^S ' • ' - 

■ _ ■'-:>•' ■ ■■ - 

^ Oirect Me a spires. Norm Development fRR 87) . 
.„ ' During 1979=80^ direct measures of "reading, spelling, and written 
expression were administered to 566 elementary students from- three 
states in order to (a) investigate tlie feasibility of using a standard 
task to measure the reading, . spelling, and writing proficiency of 
elementary childrtn, and (b) describe procedures for establishing 
local norms on the standard tasks. The grade 1-6 students from 
Mihhesota, Pennsylvania, and Washington were selected randgrnly from 
school districts tf^t volunteered to participate in the study. ' T?)ef-e 
were 275 males and 291 females in the total sample, which ihJluded 92 
first graders, 85 second- graders; ,96 third graders', 99 f oUrth^graders; 
101 fifth graders i arid 95 sixth graders ; • , 
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The Minnesota sample^^icppsiste^ of 134 of the 566 students, 

'boys^ and 71 girls. Most qK these subjects (73%) were selected from 

two urban areas with populations of 50^000 and 100,000 people. These 

elementary students* were approximately equally distributed among 

grades 1 to 6. The Pennsylvania sample of students included 157 boys 

and 169 girls, equally distributed across the six grade levels. These 

elementary students were Randomly selected from two areas (rural arid 

urbanf in eehtral Pennsylvania. The remaining 106 eleriielitary studerits 

tested were from the Seattle, Washington area; 55 were male and 51 

were female* 

Each child was administered diVect measures of reading, spelling^ 
and written Expression during the fall -.and the spring on an "individual 
basis by an examiner trained in the admi n-i strati on of the measures. 
Data were examined in terms of grade level differences, annual growth, 
stability over time, arid state; demographic, and sex differences. 
Norming study (RR 132) 

During 1982-83.- falU, wiriter, arid spririg local norms for student 
performance on direct measures of reading, spellirig^ math, arid written 
expression were developed. Samples of regular educatidri studerits from 
sVx school dfstricts were asked to fa) read aloud from two basal 
reading passages, (b) spell words from a dictated word list tak?ri from 
either a spelling series or a -reading series/ '(cj cdmplete math 
pr^oblems In addition, subtraction, multiplication, and divisidn^ and 
(d) complete a written composition in response to a story starter. 

A , total of almost 1800 students participated in this local 
/ibrnjirig, with approximately e^O^l numbers from each grade (1-6). Data. 
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were surSarized on the effect of using different measurement sampling 
plans, the reliabil ity of the measures, and the distribution of scores 
within a ^rade level. : Also, the effects' of different population ■ 
sampling plans were analyzed. The local norms also were compared to 
national norms and to the effects of the norms on the percentages of 
students served. 

Comparative Study of Teac her Goals (RR 61, 62) • * 

During 1979-86, 20 special education .resource teachers from a- 
midwestern metropolitan area participated in a 12-weeR'' study^ to 

:% _ _ - - ^ i,'' ^ 

examine the effects on student reading a#.pvement. of (a) goal size 
and .data-utilization rule, and (b) meastrt^ifflent;' frequency. The 
majority of teachers were female; the^haiu#=^^'age of 9.6 year's 
teaching- experience. Each teacher sel^ct||f^Kp ^^'x -students fro®^'^-' 
his/her caseload, resulting in a studenfe^tfiple of 88 iMy^ ^aMv2Q >; ' 
girls. The students^ mean age was l^,3.^ari; :tiiieir mean r'acta^^iveT ' ' 
was 3.9. . ; ^ " :/ 

^.Teachers were assigned randomly to _one of two experimeritai 
treatment groups for the PiJrpo_se ,of measuring studer\t progress: 

_ Long-Term Goa^l Measuremehf >jJ.fGM) or ShoPt-Term Goal Measurement 
(STSM). . In UTGM, teachers ' tested students ' oral reading performance 
by administering a 3d-second word recognition test comprised of 25 - 
words randomly selected from the large set of words to be introduced ■ 
within the 12-wgek study. Teachers, in, this condition were required to 
make an instruct! onaT' interventio/i- every 10 days. Th the STSM group, 
teachers tested a studerrt's reading performance by administering a 

j30-seebhd word recognition test comprised of 25 words that included 
-f'-. ■ ■ ■ O . 

70 
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vocabulary words Intrdduced in the current instructional period plus 
words sampled from preceding stories. Teachers compared the student's 
performance against a short^term^aiml ine*rel?tted to the current short- 
term goal and made program adjustments accordingly. Bdth groups of 
teaches randomly assigned theit^ students to brie of thrab frequency of 
measurement conditions: daily, weekly, or pre-pbsA measurement, 
during the first, sev^th, and twelfth weeks of the /study, teachers 
administered eurriculum-^ase^' measures tbdth word recognition ^nd oral 
reading passages) to all students in the study. 

Teacher decision-making information was assessed weekly through 
the use of an interview checklist. Specific questions re.l||ed' to how, 

why, and when program adjustments were made- mxi teacher re-estimates 

- ' > . ^ 

of long-term and short-term goals. Teachers also rank ordered the 

five most effective student program changes for each student from 

among eight instructional, eight motivational, and eight 

administrative and physical arrangement alternatives. These rankings 

occurred after the 3rd, 6t\\, 9th, and 12th weeks of the study. 

Comparative"''g^u^ of three Read i rig Placement Procedures (RR 56, 57) : 

Two comparative studies irivolvirtg the accuracy of reading 

placements we?^^ conducted during 1960-81 with 91 'randomly selected 

students, distributed across grades 1-5 in one midwestern metropolitan 

public elementary school. All studej>ts were ^Eriglish speaking, 15 

students received special education resource service, arid 23 were 

^: / . .. . _ ' : 

enrolled in Title I programs for children who were "seriously behind" 

in reading; ' r , 

In the first study, the correlations and aqreemerits ambrig scores 
on curriculum-based measures, scores on technically adequate 

. 71 
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achievemeBi tests, and teacheh judgmehts (actual pTacements) were 
investigated; Five trained examiners administered two standard i2ed 
subtests ^ 10 reading passages dUrirrg^n hOdr. session; The reading 
passages were, administered for one mihuS each in a random order, 
following systematic procedures. Seven iristructibnal criteria (e.g., 
70 wpm with 10 or ^er errors) were applied to the scores from these 
jsassages; the students' placement score was the highest l^vel at which 
a criterion was met before unsatisfactory performance on t^ 
consecutive 

^" ^^I^^^P' study, the concurrent validity of curriculum-based 
reading mea-silr^!was; examined for two basal reading programs. The 
measures and prbcedu^^ em'^loyed were identical lo the first study 
with one inception. ' Tn tkferstady,- two reading series were involved 



resulting 1(0 a totll of jt^^pf^g/p^sa^^s; / each -passage, the 
seven diffeVeiit instructional erTferia were applied to the students' 
scor.es. An instructional l^vel was identified as the hi ghes^ level at 
wh"ich 'the c^t^^ion was '^et before an unsatis'f actory ^performance was 
demonstrated oa two consecutive leVe^ 

bongitudinal Stud y of Lear ning Trends Oji Simple Measures (RR 49) 

.During 1979-80, 58^children random'^'le^cted from the elementary 
schools of a small midwesterh city were tested on direct measur^es of 
E; reading, spelling, and written expression. The grade 1-6 students 

. raflfed in age from 6.3 years to 12.2 years. None of the students was 

^" • _ ^ 

:rece-iving special education services ; . The -direct measures Used were 
- - s ----- ■ . . 

those described in RR 20, 21, and 22; All measures were administered 

_ ' ' . ^ 

^^in the fall, winte«:\ arid spring of the school year by researchers. 
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nevelopmen^ of Data-Util ization Systems fRR 23)^ 

Dm^rfg'^he 1978-79 school year, the effectiveness of direct and 
frequent , spell ing measurement procedures ware investigated in . LD 
resource programs. Twenty-twOp^^YoTunteer spec i^l^Jducat ion resource 
teachers and. 80 grade in.2-6 fetudents receiving spelling "^'n struct ion 
parturipated -in the study. The stijdents, who were' of low to middle 
income SES within a large metrbpbl i tan area, were at least two years 
below age/grade placement in spe;Tririg achievement . The number^f 
students^ per teacher ranged from two to seven. Three-quarters of the 



students were malo^ most teachers were female. All teachers had 

_X ____v_ . ' . _ .. . 

^taught special education for a minimum of three years. 



Prior to ;the first experimental period, each- student ' s spelling 



performance was /assessed on grac|e : specific tests. Instructional 
placementi >vere.-dete^^ by 'the ^rate of letter sequences spelled 
correctly. dai ly spel 1 ing period using 

100- words ''f^c^'^Meye^^^^^ were 

.. rispbhsibjg^ for -''e^ltrj (decisions, such as the number of 

words to introduce, (^a'ily; anjf 'were encouraged/ to\change the 
instructional program as needed. To facilitate program chapge,' ai list 
of 12 spelling iriterveritions was distributed to each 'teacher in a 

.chSlhist format. Teachers were instructed to check the strategies^ 
they used for each student daily. • 

Three different formative evaluation systems were designed and ^ 

-i - • _ ' . . 

impTemented as treatments . Teachers-were^^signed raridbirriy toc use one 
system for^ a three-week period; v,they were trained during .a \h hour 
workshop. V 



69 

^ the first system^ daily measurement, • and data-based rules 
(DHDB), teachers taught for 10 minute^ and used the remaining^; five 
minutes fol testing. A weekly spelling goal wa»i^s%l i she'd ' add; 
teachers used an aimline^to indicate the need fo^ 'an_ instructional 
intervention. If the student ' s ^er/ormance fell below the aimline fbr 
thr§« consecutive days, tfie teacher drew a hew aimline and implemented^ 
a different teaching strategy. If performance was above the line no 
ne'w teaciiing strategy was implemented. -- ; 

If) the ^econd treatment, jaUy measurement and teacher iad^mBnt 
(DMTJ); the same measurement procedures were Used as i.ri the DMDB 
treatment, however, rules were not specified regarding when to change 
teaching strategies. Teachers graphed " student performance and were 
asked to judge whether the students', progress was sufficient to 
continue using the same teaching methods, or whether a 'hew. teaching 
strategy would increase performance. 

^ ^ In the third treatment, weekly measurement and teacher judgiTient 
{mJ3l, measurement of spelling performance occurred only^nce during 
the. week ind tfie students' scored were recorded in a grade book. The^ 
teacher judged the need fbr an instructional program change consistent 
with the guideiihes. from'the Df#*^conditibn, ' ^ 

For a seci^nd thr^ week period, half of the. teachers in each 
treatment were random^ly reassigned ia me of the other two treatments : 
and were again trained in the procedures. Thus, at the conclusion of 
the study, each treacher had ' participated^ in two of the three 
treatments. Students were tested b^ researchers before the study and 
after each experimental period on three grade specif ic tests and a 



70 ' ' \ 

grand master .test that inclu'ded words- from aTV elementary grade 

Teyels. 

_ _ __ 

Experiment ^^tudy of Formative ^Evaluation Effects (RR' 88, 95, 97, 

111, 116) 

An -experimental-control comparison was conducted during 1981-82 . 

^ _ _ . fi. 

to determine the effects of training teachers in the use of continuous 0- 

direct measures in reading on student achievement and the structure of 

_. _ ' __ . 

the learning environment. The. subjects included three different 

t ■ ' . - 

samples; these ace described below. After extensive training in the 
use of direct measurement procedures, teachers were directed to - 
measure exper imentaV^tudents daily using one-minute timed samples of 
reading from the student's curriculum, to develop lEP long-range goals . 
and short-term objectives, and to use the data to evaluate the 
iristructibrial program, over the entire school year. Visits by 
observers and frequent phone contacts provided feedback to the 
t'eachers bri the accuracy of their impTemehtatioh of- the measures. 

Both experimental and cbritrbl subjects were, administered two 
achievement- measures (timed -samples and subtests frbm a standardized 
test) and the Structure of Instruction Rating Scale. In additibn^ the 
Accuracy of Imp Irementat ion Rating' Scale was ' complete^ fbr experimental 
subjects .' the Structure of Implementation Rating ScaTe (SIRS) was 

designed to rfieasure .the degree of structure of the instructional 

^ _ • ■ 

lesson that a student received. The observers rated 12 factors on a 

■■; . . .. - - ' ' ^; -_ __ ^ 
scale of 1 (low) io 5 (hi^gh). Inter-rater agreement was. high (.92); 

; _ ■ _ - ,: X ' 

in iddition, the reliabrlity of the ST^ as indicated by meSsures of 



hortegeneity was :.56, '^The^Actzuracy of;. Implementation Rating Scale 
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(AIRS) was designed to assess the degree of implementatioh of the 
cbritinuoti^direct measures. ' The, AIRS consisted of 12. items. rated on a 
i (low) to 5 (high) scale. Parts q/ • tfie s^ale require direct 
observation whereas other items on- the checkl ist " are completed by 
inspection- of , student reading graphs and reaS<gg lEP forms. The 
reliability of the AIRS as' indexed by internal ' consistency of items 
was .62, which Is adequate for research purposes. ' ^ 

Sample 1 (RR 88. IM). The sub.4ects; were 49' grade 1-8 students 
in a rural educational' cooperative, representing 20 .experim^tal^ 
control matched pairs. Three fourtl^bf the students were boys and 
the mean grade level of -the students was 3.8.' All sabjicts were 
functioning dramatically below their peers in reading. TR'e students 
were studied in the resource room setting; their teachers weFesevsn. 



special education resource Heachers whose V)<periehce rar^eB frotn two 
to SIX years. ■ :: ^ . 

WJe 2 (RR '96, 0). A tota"! of 39 special, education teachers 
and their stunts, from ..a large urbaSi schooT' district in the eastern 
par^ of the U.S., par1r<;cjpated in the study. Most of the teachers 
were female; students seletted- from their caseloads read atjoUt' th^-ee 
years below grade level (fifth grade). Students »fe|-e in programs for 
the emotionally handicapped, or thQ. br^i h^injurecri 5^ were placed in 
resource rooms. • " ' __ V , 

Sampl e 3 (RR 111. JUa ). The subjectsT 'were ^38 elementary grade 
1=6. students in a suburbaS' school district.- Most of the students 
(84%) yvere male. . - 



1 
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Technical Characteristics of Direct Reacling- Measures (RR 

Three concurrent S/alidity studies of direct read tngrliieasures were' 



conducted during 1978-79 in order to exTamine (a) relationships between \ , 
. ■ ' ...... / . ' y 

the direct measures and standardized achieveinent measures, (5) 

resource vs regular program differences • in student performance, and C 

(c) grade level differences in student performance. 

In the first study^ 18 regular class students and Is/bD program 

students in grades 1-5 from a suburban public school were tested, on 

five direct measures of reading (words in 'isdlStioh, _wor|s 'in' context , 

ora\ reading, cloze comprehension^ and word meaning) arfd two 

standardized measures " (Stanford Diagnostic Reading test. Woodcock 

Reading Mastery Tests) . the second study^ 27 regular students and 

18 LD program students in grades 1-6 from two urban public schools 

were tested on the sam^ five direct measu^s as used in Study but. 

....v.. •• '^.^ ____ ■ _ J _ _ _ -_ 

^with siDme . ffiinor modifications made in them. No standardized tests 

were used in Study li. In the third stu^y, 43 regular students and p ' 

LD program students in grades 1-5 from three urban schools were', tested 

. oh four dii'e^measures of reading (third-grade word list, sixth-grade 

word list, third-grade bra?- reading passage, sixtK-grade: cloze ^. 

passage) '.^tid three standardized measuresr (Phonetic Analysis and 

^Reading Comprehehsidh sMbt-^sts . 'o-I^Stanf ord Achievement Test and 

Reading Campre|iens ion subtest' of Peab^y Ihdi viduaUAchieverTi(^ . 



A ggregation Stud y (rr ,94) #^ , <^^^^^. 



The effects^^%f aggregation on. £he reliability of measure^' of 
/ *^ academic performarfce were- explored- in two studies during 1980-8i. In 

^^Jhe first stdtJ^ subjects . were 30 eTementBry-age' students ' nanddrnly 

. • *". ' ' - '* . ■ ' 
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selected from- aopoc?! of 99 'siudents involved in another stddy; the 
students were -all English speaking and attended a midwestern 
metropolitan school. The students were tested four times on the same 
forms of a .reading .passage measures and a standardized achievement 
test. Group stability coefficients, .within-subject- reliability 
coefficients, and .group correla^ja^ between variables each were 



calculated on the basis of one or tWd' testings and then on the basis 
of aggregations over four testings. ' => ' 

In the second study/ .78 children in gradii 3-6 who wePfi descried 
_ as ■•high-risk" for receiving special -education servi.ces,- wer? tested 
10 times -on alternate forms of two ;dfrect .r|ad.ing meas^ures and one 
written expression measure. Once per week' . a lO-week period, 

students read aloud words for- one minute; two measures of reading, 
words read correctly per minute and errors per minu.te, were scored 
"During each testing session,- a writing sample was obtained. Each 
student was presented with an alternate form of a story starter and 
required 'to write on the story, topiG:for three minutes. The number of 
correctly spelled words was scored. Group stabilUy coefficients were 
calculated on the basis of 2, 4, 6, 8, and 10 testi\^s. ' - * . 

Teachen Efficie ncy Studies (RR 53) / . ■ '-..^-^ 

_ _ _ _ ■ • m ■ 

A series of studies examined teacher, efficiency in employing 

repeated curriculum-based measurement. The 'stud ie.s involved a groUp 

of 10 special education teachers ^ in a midwest rural' educational 

cooperative (see' p. 1). m addition,- five female teachers in a 

suburban. school district ssrved as a.cbhtrast group; . .:■ ' 

Dependent measures included teacher efficiency (teacher tirtie and 

student: transition te^task time) and teacher Satisfaction. Teacher 



time data 'were dbtairied thrduglf observa't ions ; 5*udeht t'ransi'tibn to 
task time wa^ . estimated by teachers on a self-reportr questionnaire,. 
Teacher satisfaction was measured using two self-report surveys: the, 
first measured teacher satisfaction with the efficiency modifications 
immediately following the experimental phases and the second obtained 
information* on .actual teacher practices seyeraf weeks followi/ig 
experimental phases. . * 

• After traini.ng 'te^achers to organize^ administer, 'score, and graph 

_ _ ^ ' « _ _ ' ■__ _ . 

academic measures,, teachers'*' efficiency in using procedures and' .the 

. • 'V' _ - - - ^ . - ■ - 

-reliability of self -observatT'on was measured. Teachers admi nistered 
the measurement tasks in any order they preferred. During the 
following week, teachers administered the tasks to. the- same student in 
a prescribed order (reading, 5pelling^ then written^ expression K "l^e 
prescribed order was designed to allow teachers to use tHe students' 
Response time for the written expression task to score and graph 
pr^viousHy administered .tasks. ^ Efficiency also was assessed as' a 
function of wherv measurement, occured. ' In v^e^k one, tea.chers 
aflministered the three measurerffent tasks at the rniddle o^ end of the 
instructional period. During the next week, the teachers administered 
thej tasks'^ as soon as the . student, ehtered'^ the room. In additior^N:o 
recording the amount of time taken, the teachers completed a teachpr 
sat isf action s.urvey. . \. ^ : 

After obtaining the results from these cbmpariV6hs, teachers 

selected ways in which they would try-to increase t^'eir efficiency. 

- % 

These were stud'ied. in 8 singf^ case "studies using an ABA reversal 
design. Each phase lasted about two weeks Hurihg w^ich time 
approxdmately six data points were collected. " 
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One S«ar after, their original tra-tning, the. efficiency of the 19 
^ teachers was compared to that of five suburban teachers who, had bfeen 
trainid^ via a setf=:iristryctiona1 manua1„ and' periodic ihservices, and 
who had not been systeinati call y prompted to . imprij^e their efficiency. 
Both groups monitored their own measaremeJit act i^b' ties toj^rrive afar 
time representative of their er>d-of-the-yeSr efficiency. -. • 

Tec fin icaJ^tharacteri Sties of -Direct Spelling Measures (Rj?^) / 
^ ^ Three concurrent sfali dfty- studies, of-' direct measurers of,' spelling 
were conducted during 1978- t9 in order to examine (a) relations hips 
between the direct measures and standardized achievement measures,: fb) 
resource ys regal ar progratn differences in student performance, U) ■ 
grade level differences^ in student performance^' ■ and > (d) various - 
scoring proeedares, time limits, and word Vfsts. . 

^^^^^ students and 15 LD' program 

^studenfT i^^^^^B&J^^m two urban public schools were .tested .on 
two dire'ct^jDel'Hng measures '.(!^ dictated word 1 ists ' and a. picture' 
.stimulus writteri^sanjple) antf'one standardized measure (Test of Written 
Spening.)Y' In the secpn^ study^ 35 regular students and 10 bD program 
studentsjn grades 2-6 from two. different urban public schools were 
testeroh, four word' lists (-^elected from various grade levels) anA th^ - 
spelling' section, of th^ Peabody Individual Achievement Test. Ih' the 
•third study. # regular students, and' 29 b& program students in grades 
2-6 from, two 'urban 'public schbolsf and four urban parochial schools 
were tested on f our-- word 1 ists (3 of tihVch had. been used in Study II, ■ 
plus sone developed by selecting :randbmly from a basal reading series) ' 
and the spelling section of 'the Stanfdrd Achievement Test; 
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Technical -CharacWj sties of Directf Written Egression Mea^u^ (RR 
22) ■;■ , ' 

Three cbncurrent validiti? studies of direct trieasares of written 
expression were conducted .during 1978-79 in order to examine fa) 
rel-ationships between the djrect measures and standardized achievement 
measures, (b) resource vs. regular program differences ip student 
performance, (c) grade level differences in Student performance, and 
(d) various- scoring procedure's. 

In the first study, 16 regular class students and 12 b& prografHi 
students in grades 3-6 from two urbasn schools were given two direct 
meaSLrres of written expression: (story starter and picture sfei^lf^ti^) 
. and one standardized measure- fTest of Written Language). Six scoring' 
procedures were applied to t.he Written samples (T-urrit 'length, mature 
: words, large words, words spelled correctly, total , words written^ and 
rates of words written). In the- second study, 24 regu|^felass 
students and 28 LD , program students In grades . 3-6 in. one urBan public 
school were tested on three direct measures (story starter, picture 
stimulus, and topic sentence ) arid two standardized measures fTest of 
Written Language, and Language section of ^Starif ord Aehieviment Test). 
.Seo>ifig procedures used were identical ^ to those of Study f. In the 
third study, 51 regul.ar cJass students arid 31 LD- program students in 
grades 3-B from five •''urban elementary schools were -tested with the 
s^e direct measures and standardized measures as iri Study II, In 
additiori, the Develofi)menta^l Sentence Scoring Systern was employed as an 
additibrial validation measure. 
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Survey of bB teachers^ ('^t^fi 80) ^ ' 



During 198^:1981, 128 ^achers . of learning disable^ students 
cbmpleted a survey- on instructional program, planr^ihg and 
implementation practices. The survey was sent to teachers ' randomly 
selected from the national membership list of the Council; for bearning 
Disabilities (etD) of the Council for Exceptional Children; afollow- 
up reminder was sent. The responding teachers were from 42 states 
distributed fairly evenly among rural, suburban; arid urban sch5ol 
■ '<Jistricts. The majority of ' teachers were female, ^e Id graduate 
degrees, taught in elementary schools, and • provided' direct service ■ 
^.instruction to lea'rriing disabled students. The; average numbe'r of 

years of experierice teaching special education s-tudents was 6.3 years. 

--■ - -' - - " ■ ■ ■ _ I * - 

After ; interviewirig 25 learning disabilities teachers, a 

comprehensive eight-sectibri ' survey was fjesigned. Each responding 

teacher randomly selected orie student (according to specific 

guidelines) from his/her caseload and provided information about this 

. student' 5 program, including scHdolarid teacher, information, student. 

information, selection of lEP . goals and bbieetives, program 

description, determinants of the program, changes in the origina-1 

■ , instructiorial plan, evaluation of progress, arid other topics (e.q^ , 
teacher satisfaction, general comments). \ Teachers were 'provided with 
a repertdire of responses^ for some questions; however, the list wag 

/ not viewed as exhaustive and teachers were encouraged to use^ ."other" 
as a response. - "_ ' ' ' 



Interviews of, Special Educators fRR^^l) " j ..n,-^ \: # ' • 

During 1980, 18 elementary teicheps'>^>/et^^^^^ tlneir 
pakicipatioh in .fv 19/^9 stucfj^ '^Inves^tSgatiM thl^^^^^^^ 
jp^asurement systems on iristructibrial cfeciS-ibn making j^^tfi LD' students. 
The major purpose af the 15-questibrt structured in^yify^ was to 
determine the teachers' perceptions of the strength^ '';:and weaknesses of 
the original study and to furthgr ascertain whether the research had 
any effects on individuaV teaching styles. The interviewers wene not. 
staff members of the Institute and had no prior invblyement with the 
original study. Each interview' lasted about drte half hbur. . > 
S urveys of Expert j i BVta^ -Study Participants (RR 114, 124) ^ 




Students, plrents,^teacbers, : and administrators in four rural and 



suburban -;Minnesota\ school .-elistrlGts " provided survey infdrmatidh 

^ „ related tb an experimental study formative' evaluation effects (see.- 

f ^ _ _ _ _ ... / ■ • ^ . ^ 

RR 88, 111, 116). One survey focusedr-on the communication of lEP 

---- - ' \y ' ft^ _ _ _ 

goals arid studerit progress. This "survey was cbmpj^ed by 1^ parents 

of experimerital students, 25 regular fcfassrobm .teachers (16 teachers 



of . experi merit a^l^^^tjde^^ arid 9 teacher^^f control students), and 11 
administrators' 'from three schbbT * districts . The survey! arid damped 
ret'j^'n envelopes were serit tb these individralf at the -Qnd of the 

■ ' . - _ J-^ -- : ^' 4 - • . - 

school y^ar. The surveys differed slightTy as a: fQrietibri:-*f the roTe_ ' 
of the respondent Parents completed a 10-item survey designed to 
assess their cpnfiderice in the placemei^t committee's' dec.isibri^ri the* 
delivery of special education service irt the area of readirig, thieir - 

__ ' - . . t.. 

knowledge of and satisfaction with the child^s year-end readirig goal 
and progress tow^d -it^:and tg|ir knowledge of the child's /ac^ic^mic" . 
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statti^ compared to other stude^b .,bf t|e same age; Teachers completed 
an 11-item sarvey on students tl^y'had' referred^ and who received part- 
^•time special education ser^yiipes. The sjdsrvly focused on (^) 
particieation in, 'the TEP or pg^le , r^v^ conference, (b) 
satisfaction with and,, usefulnes? -bf a^kssment Jnf ormatfon, .:1c j 
Clarity^of and sat isf action wi th/ tbe' student ' s reWng program and 
progress, and (d) student perfoWance relative to other children in 
the Classroom. Administrators Completed a 9- iterrf 'survey focusing "dn 
their participatip in the ^dents' conferences, satisfaetj^n wi?h 
assessment information," clarity of student's reading -goal ■•arid s^s& 
for monitoring fjrogress, and , their views of parents ' .undei^'4|i:difi# of 
s^|ial eaucatiOri services provided to the studerf^. 

A second basic sljrvey focused on the effects of- the experimeHt 
^tudy on instruction, teacher estimates' of student progress, and 
student knowledge of performance. This .survey was completed by 31 • 
special education teachers and (through an interview procediifre') by. lis' 
eleme^ry-age resource r0^fcs|udents . Teachers .completed "-three 
'Purveys over the. course' 0 




1 yeari' Two surveys completecl - 



during the .year, focused hn ^iMn t ^jB^res%- ^Si-lsi "^6^ ^k^e^ of 



fUrictibm'rig in reading* • .;i2-item. ISrvey was eomplete'd ^ 



% ^ teachers.-^ at, the end of >he yearj^ '/rt asked t^^hers^^to rate*anf " 
■ describe how tfj^ experimenpl /proGedures- "were-di^^ ^he.ir 
- normal evaluatibri procedures and - 1^ indicate, whether, and if so'^how 
they would use. t^prpeedures ddriti^ the sub^eque^t yfan.^ A /f op- i tern ' - 




th^l^adin^ progress, (b]^Re^^VeW^^^^ goals, . and 
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80 : .... .^r.-, 

like^lj'^^ t^at they would attain their reading gbals; Two atSitioni^l jt' 
items -^Jte^red . interviewers; to assess th£ aGCdracy:i^of stuxlent ^ :^ 
response.s against the: student's reading graphs and rsflJSa' ^ ^ ./^ 



pmpar^i ve Study of -teiuiinq Domains and Duratjoris (RR 48) 



Three studies were w 
effects of . variations 




*6 



.w^,,^ (furihg 1979-80 to examine the ' 



procedures Used for eurricdidin-based ^-""'4^ 
assessment . of reading ^rlDTlciency. The first study^, addressed, the; • 
question of the influence of sample "durationV^ "the cbhcurrent - ' ' 
validity and variability of the' measarfcv TWO groups of .stUdVnl 




served as stibjeets. The first group in^fudBd^27 students r| 
selected frbrri grades ' 1-6 in two puMic €rban eKemeqtary schools ij 
-'-'^Targe metrdpblitan area. The secbricf group^ included 18 stu^en^^,^^ 
LD resource program! in these tWo schoo3si The ^ five •'Cii>rriculurn-based 
measures (Words in\ Isblatibni'^ Words in eon,|e><t, dra-1 lteading^_^lo2^ 
_C^wiprehen^^n^ ..Wbrd Meariina') were administered . ihdiVidual ly in one 
ses-s4ori''to^each studeiit. ■'Th^ two 30-secohd and two 



^) :J^46-se^on4 parallel fotms of the word recogn-k ion rieasures^' 



'4' 



|Qr ^the Cloze rne|^ure^ etach teslFwas two minutes. ■ ^ 1>"^ 

' The second study addressed the qUes^^t— of the influence of 
sample dii^ation on th^ levels slqipe, and variability of performance 
ov^r^'repeated measuremisnts. ^ Two second grade, eight year bld^gfrls in 
^the same classroom were selected as : subjects because of their 
co^^isteht schqtol attendance, simi 1 arity "tiD^ltf^ and^seribusHy 
derayed reading , performance. ,Both students r|ceiv^d .Titfle I 
prbgramming daily. They read from the same reader^ ^rked on j%h^^e ^ • 

riics^ .categories, and, over a^ive-week interval^ both cdnsil^te^tfy^^^--^^^ ^ 




>85 





EKLC 



■ 81 

scored within five words of iach otlier^ on^weeklyi One-miridte samples 
of the fiumber Of cbrrecf .C-V-C words'^read frO%f la^hcar^Si A maltiple 
bajeline across subjects and reversal design was used and eensisted of 
four ^experimental phasesr Phasi a daily.-3p^s^^^^ measurement 

sample; Phase B, a daily three-minute measurefiteiit.^^afnple; Phase" B,:' 

i s ----- - • — - ' — 

returrt to a daily 30-second measurement ,samp,l^i and Phase D, return to 

- - - . - - - ■ ■ ' 

a daily three-mi nute s^Sipl^. ' Oiti v^re collected fett the number of 



•i' 




" - * 

0 " : 



correctly and -incorrectly read ^V-e woPds p«r minute. The Title I 
reading-- teacher inc^'vidual ly collected the' data at the end of the 
students' 

Th| thjrd study was designed to examine the effect thgt varying ? 

the si2-e of the' pool from wh'ich iten^- are drawn", has on slope and 

variabilify of performance 6n the- -measure. Subjects wi#e 20 students " ' 

_ ' J V _ 

in a<^etropo1 itan school district*' reading at grade 2-4 instructional * 

r'. --^ -. ' _ _ ' ■ 

levels. .^Teachers instructed the - students Using the* grade specific^" 



. wOrcT JisW repr^e'nting their instructional levef. Ins.tri^ion - 
' . j/ccur»^d. fQr 19 m.i nates da i.ly, followed^ By teacfcr . ItJministrat^on of f 
^ three 30^$con<i lists.: -on^from the -appropriate d^e -one f rom 

; ,;vihe a>prppriate instrtJctiorial "i^el , and one-^frlS, the.,acf*Oss^ra* ' 
/ ^ domain. ^* ' " " ■ * 

• 7~ Cgmfiarafei ve StUd^^^aadaj-^ized'aad iMrygt" Mfasur^^s (RR-^26) t^,,-.'. 
■ V ' effeg;tfere'^s b?\.iirap^measurement , techniques r "aia^ • 

standary^zed achievemeriti^est%fQrlassess| within-«lndividual change 
■ were cort^^ret* over a lO-wee^^T^lrriod. total, of 83:^rade 3-6 low- 

ij^rformed below ^ the _lSt^ percent! le on a* ^ 
^^P^ess'ib^ -from a rurat mid^estern ar^ were ^ 




acBifeVfng sturfents (ones 
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adm1ii|^ered the Rea(f|hg Cpmpreherisiiari arid harigdage subtests f^m the 
Stafi:^«J Achievement Jests " arid a direct measure of reading (see^ RR 20) 
in Octqt^r and again in Decemberi ' ; ^ 

^ 4fflp^eftieftta^i on Stug,y (RR 106) ' ^. .* 

DurTng 1981-82i educational pfrsbnriel provided inf ortjiatiori on the 
feasibility and cost effectiveness ^bf a cbritifeiotis pupi 1 orogress 
npn^itoring system that was implemen|;'ed in t\^ elementary- sehodls, and 



5tud6jnts. 

J 



A tcHal of -38 educational^ personnel 



- --^'^ ' **^'part<i 01 pi^^edt in the ^weekly measureme^^ the stiidertts'.. Ineiudeql were • 
teachers, tutors, aide^ a 'school pkychjbloq'ist, arid rprincipal . 
;.TWelty-f i ve of ^these individuals completed survey at the end of - 
*year that focused on their vus(^ gf infcDrm^ion^ the tjmS reijuire 




the s.^^teiTi- ^ridj^th^i^ reac.tions to specific aspects 
§ing\elSubj^ Sttfdy jRR 12^- . ^ / . 0> 

lAg 1980-81, 'tftSSj^^^ of . two ^ta^^'^^^'^^^^ 




.^ spell.irig' achievement . eompa^ an ^'^ (^y^^^^^^^^ 



grader wFib had; 



.yjigijose^ as lear^iVg disabled' in 



^l^|i-'^midwesterri . school. ^Tperit . ortg • hour,^->ly 

^ re^04i)r^e room^ receiving smalV.grbup ^siru 





in readTng, J^ng 



) (3^^ arts, .aV^^M^ ^ ^ - a . . 

' ^^'^ - At^thP t^irrn^g of the* stuH^,- the' »sti^^nt received^ f i ve\ m^^^ 

.of^a-iiS^ d'i^ct instruction on a^ridorfT^^Jetti'bri of wbrds -from eaeh 

Y ;^ 0f tWOi 

^ vj^i^ randqml,^ divided into two reword i^cks^ ji/hich werp: a^urr^^d tb be 



•^qrd pack-sv-;...T9^^ lists of stJeVBrig -4embriS-;^^^^ dl|Heui.t ^&ords y 



ERIC 



equivalent in difficulty. ^uring,>instruGtional sessibhs^ 
taught arfi-mjisured^ on sets of diff'^cult ^[jeMirig 



•was 






83 



were aha 



r 



using a concurt-ent^chedule dSrign whereby equivajent ^- 



behaviors -are #eated simultaneous}^ with different approa^|jgs to 
determi he rel at i ve ' treatflient ^ff^^^.^'\x^^^!:s^\. apprqaeh 'i:n^1 ved V 



the foTlov 



rng 




^ata-utili2a-tl^n rulfif If the students* perf o'S^e ■ 
fen. belrfw the expected lev&l' on three coh^cUtivp days, the* fc^acR)er 
introd^ed a program cfPvange. ,m the second^' tre-atment, the .^teacher 
made chpges i_n the student's program every 5 10 days. .Througfi^ut^ 

. -- ' ■ ' • ^ . . * , 

the _study, the me4suren#iit task was an analogous .Orte-mihute timfng of 

J . • . 

the subjects' wri.ting randomly selected words from a 

Oependent datnTwere words ebrre^jtnd errors per.rritnute. 

_ _ __ j _____ _ _ ' , _ \ 'V^' 

toa4^bimy./of 'Written -Express ion Mea^u^ (#%d) ' 

during 1981, the « rel i abi Vity of four ' measures of wr'ilt 



i/.^ Cor^ctly, and Letters in Correct Sequence )■ .-was ■ examined 




\^ exp^ssion (Total W9r<ls Written, Mature Words, Words Spelled 

rect Sequence)- .'was • examined. ; 
-es of reliability examin^d^,.'K Twenty- 




in a 



'subjects varied for the fo 

-^eight le*rning disabled studS'nts attending' a summeP'' program 
metro^l i tin mi dwester n ^-lemen^ary school" we. used'^f^^amine^ test- 
;t neliab4Uty^ Ji^liH^rffi reliability wa exalirfed wiih '-ief 

♦from Ws^Hmbian^ Tffi'dw^stfem 
;t-re"test an|^para1l€T^^I(dfrrh;reri ea^ 
"Student w|;S '"admihi^tJl''^^;,^tiifO i4en 
five minutes Ifo /^ite. a compositi 
thev 



I; 



len^htaVy .^tijden^|^^|llect^^; ranr 



cUaes. Tb^dfiterfm' 





std»:y starters ;;aridj was |ive/i 
The admih>#tratioh of 



was tf^^e^ weel<s^^^art - for" *iftg^y:ts|^fetest 



' To- 



terjnine s^it-fc»|lf j-e'liab'ility Vi 
"^tjie j*rJ^*fT\4Drt^ it ions' of r"^ 



ihsi^t^e 

dents" i/i gr|d^ flirS^gh 'S^ ^andoml ^elected f^o" 
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schools in a large' mi dweatern ciityj were examined to deteprtvine ho'w';^af 
each ^udent had wr.'fttijn at^ th^ end of minutes' 1^ ; 2^ 3,-4^ and 5. 



Inter-saorer reliabflitfy was examined^ for 20 students enrolled An,^ 



grades 1^6 from a sch^V in a large city in the^^st^rn region of 

/ > 4 __ ' _ i_ _ _ _ , _ 

Comparative Study -Qi^ l^ritten Expression Scoring Prc^ ce4ur^ (RR S^) 




Otlring 1982, written expressiprf ^.s^ftiple^ from^-SO slt:uderits in 

^' ' fc — ' ' ^ " -- - ' ' 

grades 3-6 vfere scored ir^pirnjj- correct word sequences ^o 

investigate (aJ??Ppe. cohsifte.nti3r» awioa^^ scoi<^ers using the procedures,^, 

;(b) the ^ypical performance .^e^ve Is of students ir) grades 3-6 ^oni this 



measuw^Aahd (c)' the .vaJidHy;|of this measbre 4^e1ative lab c^^it^ 




measures of written ^xprqssi^h, ■ The students wer^ selllted randomla:;-. ^ 
^frdm«a*set of students whb^had pa»;t icipated in a^'previod^-st.ucljib 
average age was 10 ^.ars. ^ind their averag^^^atfg ^eyeT wa^^^T^ 

Three trained*graduate r^^arcli assistants tested 
an iridividuaT4)asis. Students Were asked to write 
respohse tc^ a story ilfart^S^or topijc sentence ^ and*. 
of Writt^«-^Langua^ Each coftposit ion was^^^^s^^ed using se 
criterion measlires Xp^yelopmentaUSentence Scoring,- Hunt's mean t-.uft 
|Iength, c^'^^st of wri^en^ express^Jqnj holistic" rat 

scale, •wor^' s pel Jed and tot^J words written)'. The wjit^n^^* 

Samples ^Is^/^ere s^t'^^afc'^n^^^teachertv^^ bSe nb^ieachei^ for 'the^ < 
num^r . of eorrecy^wdrd sequences, which was defined as t^o adjacent, 
correctly spelled wor^s that are^ acceptable within tfte context of the ' 
phrase^tb a n^ve spe^er of thSPEngfisH language, Ir/addition, .s^lT^. 

, many of ^omte^^feter 'ly degrees •'arid ^rei^cert i H^fjf i n^ ^ fj^ 
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tenr5w|iles accord fh^ to twqf • 



ye Language \ ^ ^^^^ • ^ ^V^^^.'^ ''. 

PdSt=]idc ihalyses were eon^dact^^S^^/lp? defers ^ 
whether-^bjects' .expressjive/-^^^^^^^^ was ^semantl caf^ y and 



syntactically mdre compVex wh^ te^ed^Bj^ a familiar examine/ than 
when tested by an unfamiliaj^ examinen, a'nJ^^b) whether the duality of 



^ spoken language was related tp^jfluen^^^^ W|cts were a4 preschool 
. children '"^Y^ose speech antf^dr. language function^ "Wres^nte'd^ ^ 
;;iflJoderate to profound handicap. The students were enrolled 



in a 

eial edi/cation preschool prograin within a large urban midwestern 

. metropolitan school ' dist^rict. The liean age of -We studen^was 4-9 

mar?; t|iere_were a^l^osj^twice as many bdys as girls, and minorities 
* J . ----- ''^m- K, ' 

repr^esented 3W% of the ^ple'. All? but two sUbjpcts^ performed 
the normal ^rahge on individually'administered intelligence te 




&■ 



mylti -categorical sc^je consisting 
cWarac|:ecilt,ics and semantic relation*! 



of 



salient 



syntactic 




score' records of 



;tie^ub3ects V. Expressive language perf 
rienced speech c1 iniciin 

'A- — \ *^ J 



h scofpd 68 protocols 
(r|spGnse? of 3|_^§ubjects t^am^TiaK' and unfamiliar examinees). ^ 
subjects '^poke§,jahguage wasfseparated^onto utterances and the radars 
: assessed the protjjpols f or s^mahtie/syntactic complexity, ^/ects ' 
'^M-Ji^^^^^^^ or Incorrect with respect- 

ustrat^ions, beixig described. lntsr-r*^er agreement 




4 





1 



I I,' 



arid to determine 
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Technical Characteristics of Di necj^ Social Ad.i\istment j ^asures (RR 24) » 
^1* Dufirig .the l|89-8l school year, tv^ studies^ were confccted to" 
.id^ritiffy simple and effieieht measores of children's social ^a^?justment 

ir relationship to. other measures of a student's 
classroom social stitus. \ 

In the first study, subjects were 67 third and fourth graders 
from three differeBtj^J:^srbbms ' in a large metropolitan school 
^ Rightly over ha^lf df)the subjects were boys. ^ Both sociometric status 
inventories (roster rating^ a»d peer nominations) *nd teacher rating 
scales were used to estimate the ^social status o^f the students. Using 
an interval , recording systemi^ trvained observers recorded fj^e 

S. . ^ - _ _ ♦ . _ _ _ 

behaviors (initiatlonsVby pe'^ers to target^ one-way and two-way tyerba I 

i_; ___ ___ s. ^ 

interactions between :peers' and ferget^ .,^versive behavior^ igndririg 
_ - ' ' ' ' ^ ^ \ /V :-. } 

belx^vior, inappr^6pr%te behavior) in a vaf'i^ety..jDl^jr *sitttatidris 



lie, recess, . transition time) -a/i two of .class rooms. Each 



was 



chill 

kfeer^rj^ rotated throdgh t-he^^cl0s 
Within the obs*er^tion;.perip^. - ^ 
tfiree-week period. * •Q^ferver^.i'ag^^^^ 
.87,; de'periTSing 



of three reliabi 



In the tJyIrd cla 





s^con* j^ntervals;' 



t^fl^, as iDossible 
tjpf- c^l^c ted over a 
l^a meaji, "pf :68 t^ 
a% was' used. 



ervei^rn^erordeti bSRaviors of stqde/it^ 





^as they funct[idri^d ^ixi '"CHjdplhat i ve^grbups .bf fbUr_ student's^* eachf 
membersHij) waa>5^te^-systeffi^^ 36-rtv1^^e bbservatibh 

sessions.' T3P-..,gi>(5U|)s;.^^ las 
during eadr^epid.rl. Data ;w^^yc^^ dilf.t^ree' b^pvtdri (verba^ 

^ interac-^Ji gjSBy^^g.-^b^sh^^is^^igndr'iriS^ 





\ 



ft 



r-ecocdiiig systenii Bata wiri ^oilettsd on ftvf; separate bccaSloni 6vir 

a two-week period. , ■ ' ' 

- - • _ , --^ • ^. 

In the. second study, ,thd subjects "were 58 students ^frorii two-, 
thirdrgrade classrooms in a suburban elementary publ ic school 




large metfoTs^l itah areaj .34. students wefeel boys . The' s; 
sUtiJS instruments and teacher rating scale were similar «tb those used" 
injrfie first study. '^t^seryatiorTl^^feiade by two trathed Sb^rvers 
. fo^ two hours per day in ^^F^ over a three-vp^k period y 

ma were 'collected-. on only two e^^^ fa) frequency of peer^talks 
to target, -and •(b)?-^mb^r Of, d'lfferent peeri' With whom. Mnter^tibfi 
occurred., Jhe observation ihtefval was inc.r^sed from sJx to '30 
seconds; an eveot rather than interval recording " system was>^ uset^ 
Db'server r.eliabj4ity ranged from a mean .-bf .73-^0 
Wh'fch of three forrSufas was applied. 

St ud y ^ of ^iables Irif luerr cinq Social Ad.fjjstmjat^Hggsures^ ^fRi:! 82) 

, .During ig8e^i," observatfws,J?gf& (Xjciductid ife jd^n^fy student- 



, deperfding u^on 




■beh^fO^s that ral5fc8t,to_st^nt^V-5pt^^^^ fir.st 
a%. social status^ witKin tjie group and le^nd ai^'Wfaii^r^p^ob}^ * . 
\'T^ perc^'i ved by •the^'tea^'her . Fift^ ' ' ' • ' 

sevifT^las.srgpms tlip Were' organized ffito'-twoV^ were'oBsgH/id over 
. A 10-^^tji^eriod during , both i nfprmaj'^ and formal ' school per-Tdds. fhe'/^ 
^ students attended' a midWestern urban ^ubfe js^jiool . Js^r^T'lntl^etK--^ 
rost^ ^d ;^rat-ing sbcibm^trtc fBstrum^gt,: a 



ir ^dmi nation ' 

procedure and' :( scfiddl -b^vJ^ prof ile. ^^sR^ors^ ifserve^^ 
structwed acadertfic ^^-^gtti ngs ^jTc j u^ed : noisyJ^^Mt^^ o^ pJ-acer^r^et^^^ 




•aggress;i8n,v p^ 




jgressiQH^ 
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Behaviors observed in anstructurerf. settings differed in thlt' bUr of 
place was not observed and off-task alone wa^ simply ^Idne. Observer >^;--: 
reliabilities ranged ""f rem .80 tcj' 1.00 across ObservationaV^at^^^^^J^ • 

An event recording system wa§ used by the observers who itidVed^ 
through the list of jiames, observing each student ^f^rf^O seconds ^ with 
a.5-second break between students. Both boys and' §^r1s .were observed 
over 10 weeks during a strufe^ur;fed academic period; dnly boys were • 
observed during' an UnstriSctured Junch period and free' time ^or- to - 
school. -The teafehers complf^ted the School Behav^ior Profile prior to 
the behavioral ^ob'servation^^fter^*hine weaks of data collection,- the 
two soc^omet.^^^(^, measures v?^r^^ 

eomparisoa of Stude at Self-Management fechniqU^s (RR ^ISf 

; Bdring 1981-8?,' the effects of student charting and student^ 

■ ' ■ ~ _____ ' . . 

selection of instructional ^ctjvttles were examined. Tn addition, the . 
nature of student-selected activities was cbitipared to 4:he natunt of 
^_^ teachjr-selecte^ activities. . Forty-two' elementary resource reom. 

furrl 



stud^rtts from a 
the; St 

teacher's who hkd» agreed i^pirticjA^k ]ri 

Demr) s t r a 1 1 on S tO^y of ^ a t a 1J 1 11 .j za 1 R . ^iqr) 

1 ^ \ _ 

2{-B 




specJ^li^JIacatlon c^^^ pafticl^atetf In 

Thef ^ were s^^t^C/^om'' p||| resouf-c^^ 



during. 1978, ^-52 )6hiJdreffjih 



beejj^preyv) 



^«Sified^ as ^ le'^nihg -i disabled o% ^edu^^^ 

study tffr. ^twd ^mpon^E^^f /of mative^ ^fefuat 
(•fr^uen^ of, measurement and data uti Hza^^-^^N^) ; .^l^sf^/j" 



part^ci parted in a s 



^were . enrolled iD--^regular clas^ progrSBs an<i jwer#^-r:i^ 
^ ij reading i^inst^^ctl speclar ed^U^^ 



^four.metfopo1>t|n' sc^^^^ Minn^bta-*^ . _/ 
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Four students were selected randomly from each resource^- teacher ' s 

existing caseload arid randomly ^^sstgned to either an un^^ated control 

* 

igfroup or one of three experimental treatment groups:' ^S^^- pre-post 
measurement, non-data-based chang^, (b) daily measuremen^, non-data- 
based change, or (c) daily measurernjent, ^ata-based change. Each group 
' eoritaihed 13 students. 

. Measures of oral reading Vata correct^ oral "reading rate 
;-ineorrect;, vocabulary meaning, and comprehensibrl^re obtained for all 
students both prior to and following treatment. ^B^seline performance 
. was ^^s tab mhed forest h student and a 301 "^'^p^^^ ^^^^ reading 

_ rate odrrect was established arbitrarily as'-^Pl^^day objective for 
studi?its in the exp^-ijnental conditions. 



y iristruetion was sif^lar foKallj^re^^ it 

IdSi^lved 20 minutes of reading instruct ftenV^S^^^^^ from the resource 
^Hi^'StV ^ncl followi'i^^ ±reatm^n^ stud^ts: read alouti for 
i^^^each o-R^'three placement leveli^, Were^||sl<6d to diefine 
five words from each story, and were givBh .standardi2eB, reading 




comprehe^sior^i-^asure^. • The tr^atfherit groups differed only in the 
iquehcj^^Sjeasarement ancKs.Recif i c dfea ijti^;i zdt ion rule employed. 

Hysis^^of "R ^Q^A^ Cdmpdnferf!]$ ^ (RR 12) 

__ __ ""--^ • ' - _ - ^ < 

During it| fina^^arNJf funding^ the Child Servife Demonstration 

_ ^ . . ^ " \ — * - 

lildren— with Le^arning__ Disabilities Tti;-. the Minneapolis 



ser\fled as^^ sett ina for 



s^seri^S of studies. oh 




ral 



- ^ - T . 'V 

daily data colleGtidh procedures 
ients' progress, tH&^atr^i^1^at"ion/techriiqUe^i 

Of 32 studeri^s (18 



r 



t , 
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elementary and 14 secondary) participated in. a within-.^abject' design. 
The research also served as an early test/ of the feasibility of 
Integrating exfterimental research within existing service, programs , in 
a way that direlcts and benefit^ both research and service^ 
Survey and Observatio n^ of Special Educatibn Teachers (RR 81)* 

During 1982, surveys or 147^ specia^ducat ion teachers and 
observation^ of 20 practicing t^i^lts^fc^So' Tooperating ieachers^ - 



were used to (a)^determine the ^pjMureS tJsed>!^^ by special 

education; teachers in their evaluation of student progress, and (b) 

■ _____ '_ • _ " 
assess the adeqtfacy of those procedures. A - one-page survey was 

devetoped to investigate how sOecialJ educators assess '.!^tUd^Jj# 

mastery of both ^EP X)bjectives and instructional fTRiterial presirtt^^ih 



fdaily lessons, 'their confidence" in their estimates .Df student^^!^^ 
perforftiance on ^ ihstructibnal .objectives,'^ ; and the frequency of 
^evaluatjon. of;. student progress -toward iEP goals. ' The^surveys Wire' 



mailed to members of the Massaehusetfe^federation vof the Council for 
^Exceptional Chlld^^p^arid were to^ be corti^eted by teachers ^liT^y^ The 



responding teachers Were predominantly female. Had taught an .a^age 
of S^IJyears, with .half conducting resoLfftre programs. More than half, of 
j the t.eachers held grjydQ^ate^^^^t^ ' -t*^ j 

^data- a'^so' were cb.Tlectg/l ^om 20#^pf actlctriS^ 
arid 28 cooperating teachers.,: |hirQ 
teacher with the target*' S^^eM^ A 'Tessdri pTah arid behaviofal. 




objective* Pth crite^^ion performance were provided "tb^ observers-*?^ 

_ _ _ _ _ ^ _ ji 1 _. _. >i _ 1 _ . . > ~ ' ' — 

iA/hile the^ practicing teacher instructed, the observer f^ci^ded the 
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methods- emplGyed by the practicing teacher to assess the> student's 
perfbrmahce'— Followinci the lesson, the practicing and cooperating 
teachers independently rated the^ success of the lesson; provided a 
rational^ for their rating, indicated ;whether the^tudent mastered the-*^ 
bghavidr|1 bbdective» and estimate'd the actual of . performance^ ' 

the objective if the sttideht ■ failed to ' i^ter^^ the' o.bjecti ve. ^ The . 
. ' accuracy of practicing ahd cooperating teachers' est imate'^'^^jf child ' ^ 
; \ , . performance on the behavibraV objective were compared. All i^falnees 



and cooperating teachers were . f emate. The trainees Were complet-inq 
their f inaJ practi cum for ^ spec iaV education degree; The cooperating 
teachers hact taught for ah average of 7 years ; .two -thirds :^ad advanced 
degrees. Only two teachet^s^ were in a ^^prNvate schqpl setting; ^h6 



'teac^^;:-5^ere in either' resource or special, self-contained' clas%'rbQms, 
Su rveys of Special, .E ducators (RR 67) . " 



0 



During 1981, ..three separate groups of te'^ache^s were surveyed to 

document their f^R^^^^^ty withi/ and use;., of direct and frequent ^ 

^_ _ _ ■ _■ 'i y ' - - ■ ^ - _ 

measurement of student behavibr/ Teachers indicating* use :Qf, the 

fyrocedures were asked to specify the^attjunt of time' alfotted to-'* * 

measurement of student behavior in their ^classrooms,' t^TTle- teachers 

indicating they^ did -not ose the procedures wer^e asked to, specify - -. V 

-I: Actors that irihiBRt the.ir use 5f ; the procedures.. The specific ^ ' . ' 'T^ ' 

I questions asked arid procedures '•Parted for the - three groups of ' ^ 

teachers. The first grbulj" iricTuded 135 b0 teachers" who responded to a ' • ! 

postcard survey sent tb^raridomly .selected members .of the Council V 

— Ufj tyrn' rig-. D i s ab i 1 i t i es . TKe bverafl respbh|e. rate for- tfiis 'sample wa$^ . 



U.; ^ 45.35^,. The teachers were.frcim all regibris .of the M.S.. No follow- u^' 
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contacts were made. The second survey group involved a riatidrial 
sample of 128 tO teachers who responded to ah iri-depth survey (see p. 
23). The final sample included 10 special education elementary 
resource teachers (2 male, 8 female) in a rural edUcatidriaT 
cooperative in the midwest who were required by their special 
education direcfors to^ employ direct and frequent measures in 
cbn^iurictibn with a research project (see p. 1). * 
Comparative Study of Reading Domains (RR 55) 

During 1979-80, five special education resource teachers in a 
large metropolitah school: volunteered to participate in a study 
examining the effects of varying the size of the population of words 
from which test items for daily testing were sampled. For each 
teacher, four students were selected randomly from among those reading 
at or between the second and fourth grade instructional levels; the 2f5 
students served as subj.ects in the study." 

Three- populations of reading vocabulary words were created using 
the . Harris-JacoBson Word List. The : largest population, called 
Across-Gracle list (AG), consisted of the entire pool of words from 
preprimer through grade 4. The secbiid pbpulaticfh, called the 
Grade-tevel list (Gt) consisted only bf thbse words ' within the 
students' grade'level. The third, Instructibnal-Level list (IL) was a 
subset 'of 200 words drawn at raridbm from the GL population.' Daily 
word lists for testirfg were created by drawing 60 words at random from 
each bf the three pbpulatibhs; 20 different word lists for each domain 
were created by random sampling with replacement. 

The appropriate grade 'level for instruction was determined for 



each students Students were instructed iridiyidually for 10 minutes 




9f 



.93 

daily bh 200 wbrds frbm this instructional level. Fbllowing each 
ihstructibnal period the student took a 30-$econd word readinq' test on 
each of the three populations of words using the daily tests and lists 
that had been created. Teachers recorded the number of words read 

correctly and incorrectly on each type of word list. Throughout the 

_ _ _ 1 # 

study^ the students' performance graphs wer^ evaluated weekly to 

determine the need for an instructional modification. After 15 days, 

an instructional change was required. 

Study of Alternative Readia^ Performarice Criteria fRR 59) 

.During 1980-81, analyses of the \echnical adequacy of informal 
reading inventories were conducted using data from 91 randorrfly 
selected students, distributed across grades 1-6 in a midwestern 
metropolitan elementary school. All students were English speaking, 
15 received special education resi)urce service, and 23 were enrolled 
in Title I prbgrams for children whb were ^'seriously behind'' in 
reading. Cbrrel atibnal and congruency analyses were'-conducted to 
determine the- technical adequacy of (a) choosing a criterion of 95% 
accuracy for word recogniition to determine an instructional ^level, (&) 
arbitrarfly selecting a passage to represent the difficulty level of a 

basal reader, and (c) employing one-level floors and ceilings to 

. . . _ ' _ _ r 

demarcate levels beyond which behavior is not sampled. \ 

Study of Curr 4culum Differences . (RR 93) 

the performances of 650^ elementary students in six school 

districts on two curriculum-based reading alotrd 'ta^fes arid brie ribri- 

curriculum-based measur^ were examiried. Four different basal reading 

ser i es were compared . / The ) two bas al read i rig meas uremerit tas ks 
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•consisted of a reading passage ^n^ a vocabulary word list; the hon- 
curriculam measdre was a word list. Yhe * 660 students^ who were, 

- i 

selected randomly, . attended schools^ih a rural midwestern educatiori 
cooperati\Ae-. No attempt was made to obtain equal representation of 
males and females. . ^ 

.The testing of the 660 students was conducted within the first 
month of school by 10 trained educational- aides. All testing was 
completed on an individual basis and involved the adrrfinistration of 
two ohe-mifiute oral "Veading ' passages, one basal word list, arvd the 
hon-curriculum word list. The'^order of administration of the three 
measures was counterbalanced. 
Analysis of Readabi 1 i ty; For^nu^as^ (RR 129) 

During 1981^82, 285 special education students 1n grades 1-9 were 

tested twice on three passages of a Passage Reading test. The 

students were from either rural and suburban Minnesota (h=117) or from 

New York City (NYC). §i xlireada^bi 1 i ty formulas were applied to the 

three passages examine the agreement among the fdrmul as. In 

addition, difficulty rankings by the formulas were compared to 

rankings ' prodoeed by "students' actual performance. Student 

performance, in the twS settings also was compared to explore" the 

contribution of pupil background to texfdiff icul ty. 

Analyses of Basal Reader Cri ter ibn-Ref erenced Tests VrR 113, 122, 128, 
131)) - . . . 

FoUr studies were cdnducte^ during 1982-83 on the technical 

adequacy of the criter ion-referenced tests associa^d with basal 

■ _._ ^ 

readers commonly Used in public sfchools. In' each study, students' 
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perfdrmance on basal reader tests was compared to their perf ormdhce. bri 
a standardized test and a direct measure Word reading test. .Various 
analyses were then ' conducted on the -data to examine the technical 
adequacy <of the basal reader tests, the specific subject samples and 
tests included in each study are detailed ibeldw. ' , ^ ' ' 

Ho.uqhton-Miff 1 in -&^i c -RBad4 rvq Test XRR 113) . Subjects were 47 
sixth graders who were tested on the SRA Reading Achievement Test^ the 
Hou.ghton-Miff lin End-of -Level 11 Test, and the Word Reading Test. A 
subgroup of . 29 students was tested a second time on the Basal Reading 
Test, AH students were from a school district in a rqral midwestern 
cooperative. 

Ginn 720^ Series Mastery Test (m 122) . ' Subject^ were 47 fifth 
grader?! who were tested on the SRA Reading Achievement Test, the.'Rinn 
720^ Ehd-bf -Level 11, Mastery 'test, and the Word .Reading Test. A 

subgroup' of 22 studehts^'was tested a second time oh the Mastery Test. 

___ _ ._ _< ________ ^ _ __ ■ 

: All students were from a school district in a rUral midwestern 

*cooperati ve. , ° ' 

Scptt-Fbresman Criterioii-Referehced test (RR 128) . Subjects were 

_ _ i ________ _ _ -_ _ _ _ ' _ _ _ _ _ _ 

25 fourth graders in a^ rural- educational cooperative who Were tested' 

on the SRQ Reading Achievement Test, the Scott-Foresman End-of-Book 9 

Criterion-Referenced Test , and the Word Reading Test . All students 

wdre tested a second time on the Criterion-Referenced Test. 

. Holt -Basic -Reading Series Management Program- -Level 13 Test (RR 

133 ),. Subjects were 2^ fourth graders - in a rural educational 

ciDoperal^i ve who were " tested on the 5RA Reading^ Achievement Test, the 

" ~ _ _ _ • _ ' _ 

Rblt Basic Reading Series Management Program bevel 13 Test, and the 
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Wdrd Reading Test. All students were tested a secSpd time on the 

Management Program Test. 

Measiiring Classr- ooin Behavior (RR 6) 

; Observations "^^we^e conducted on 11 students enrdlled ^^n a 
midwestern inner city elementary school. These students hao^been 
identified by .their teachers as ones having the most difficulty 
adjusting socially. Students were observed during periods of 
structured academic work. A sample of 10 peers was observed during 
the same observation period, producing 10 minutes of data on each 
target student and 10 minutes of data on each target student's peers. 
Observations foeusecf on five categories of behavior: (1) noise, (2) 
out of place, (3) physical contact or destruction, (4) off task, artd 
(5) other. 

Comparative Study of Graph Papers (RR lOl) 

During 1980-8U student performance on direct, repeated measures 
of reading and written expression were collected over a 2h month 
period for 83 Vow-achieving elementary students identified during the 
screening of all 785 elementary stpdents from grades 3-5 erirdlled in 
three rural elementary schools. The students had no history of 
special education services, but scored at or below the 15th percentile 
on a short duration measure of written expression that significantly 
discriminated tB and non-t0 students. The students (32 were females) 
were fairly evenly distributed across grades 3-6. 

The students were administered two tasks on a weekly basis forbid 
weeks. First, students were asked to read aloud for one^minute from a 
third-grade list of wprds. The number of words read correctly and 
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ihcdrrectly 'were scored arid graphed. Students i?> grades 4-f5 also read 
a list of, words selected from their grade levels. Second, sjfeory 
starters were used weekly, to qbtairi writirig samples from the students-; 
These were scored for total Wdrds^ Written, ; Words Writteri Correctly, 
Words Wri ttenMncorrectly^ arid Correct Letter Sej^uerices Writteri. 

A computer program was used to simulate charting on both iriterval 
and semi-logarithmic' graphs. Each students* .data were entered iritb 
the computer at the end of the seventh week; the slope of each 
student's performance on the two types of graphs was used to predict 
student performance ; at weeks 8,. 9, and Id of the data collection 

period. The estimates of student- performance at three t'imes was 

r 

eoatrasted with the actuaK data collected at weeks 8, 9, and 10 by 
determining the absolute deviation between the scores. The graphing 
approach with the smaller average deviation score was considered to be* 
the orie makirig better predictioris of studerit perfbrmarice. 
Assessfnerit of .Alterriati ve Data Summary- Procedures (RR 112, 118) 

A study of two basic procedures for arialyzirig time series data 
^:.(visual analysis and statistical analysis) was /conducted duririg 
1981-82. Student performance represented on ^ 28 hypothetical graphs 
was evaluated by 52 in-service '^and pre-service teachers from three 
locations around a large midwestern' city. The slope and variability 
of data presented in the graphs were varied systematically. In 
addition, two other conditions (training in'data utilizatfon and use 
of a^mlines) were varied -in the study. Subjects eva.luated each graph 
in terms of 'the effect iveriess of the program depicted oh the graph, 
arid also indicated what about the data supported each juctgment. 
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Statistical analyses also were- conducted on the data presented in each 
graph: . C;^ 

Evaluati w-g^-ar-g gram Ef feoll SeJtes-S (RR 123) ^ - 

; During 1982-83, a system-level analysis of the effeeti vehess of 
special education was conducteHi 1n^ an educatidrial cooperative 



comprised of six school districts, total of 96 special education 
students in grades 1-5 were .assessed three times during the year on 
direct, curriculum-based measures achievement .^iri reading, math, ahcf 
spelling.' Analyses of student performance data were conducted . across 
all six districts, for each district, by teachers^ and by student 
classification (bD or EMR), grade, and sex,. All measurement materials 
were' developed from the cUr^jcula in use in the school districts. 
Analysis of StatisticaV Properties of Data (RR 125, 138) 

During 1981-82, reading perfdrmance data/Jwere collected on 68 



a/] we 



resource room students over a "^riod of six rrtonths. The grade 1-7 
students were from four Minnesota school districts. All were 
participating in research on the^ effects of teachers using frequent 
curriculum-based measures of student performance when the data were 
collected. The students' data were subjected to further analysis in 
thi^* investiqatTon. First, the^ slope, standard error of estimate, 
mean level of performance, and number of data points were calculated 
for each graph to document the characteristics of the time-series data 
collected through ff^ecjuent curriculum-based measurement: Second^ a 
principar components factor ^ analysis, was performed to summarize 
relationships among 'the time-series properties and properties of the 
measurement ^system. In addition, .multiple regression analyses were 
used to identify the relationship of such variables to achievement. 
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Study of Self-rngrtructidrial Trairiirig (RR 63), 

Eight special educatiori resource teasers pilot tested a manual 

designed to train teachers to use direct arid frequerit measuremerit 

_ _ _ _ - ---- -- . _ 

techniques to monitor students' progress toward iridividuaTi^ed gaals 

and to evaluate the eff ecti^ness of the students ' ihstrlicti bhaf 

programs. All teachers were certified in special* education and two 

held graduate degrees. The teachers, whose Ireachl rig experierice rariqed 

from 1. to 35 years^ taught in a suburban school district. 

Cau^a4 -Mode^ ys4-s (RR 105) < 

Causal modelif^ techniques • were used to examine the relationships 
among implementation of a formative evaluation system, structure of 
instructional programs, and reading achievmtnt for 117 students in 
grades 1-7. Most of the students were boys Hn grades 2-5; their 
average a^e was 9.5 years. For the most part, they received special 
edtieation services in resource rooms. The 31 |$eachers ; were 
jDredomiriaritly female and had ah average of 8.8 i^ars teaching special 
educatiori. The greatest perceritage of teachers had' rib experience 
teaching regular educatiori. 

Three major types of measures were employed., fjie measure of the 

degree of implemehtatidri of the mdriitdririg system (Accuracy of 

ft 

Implementation Rating Scale - AIRS) and the measure of the degree of 
structure of t4ie students ' instructi dhal , programs (Structure df 
In^ruction Rating Scale SIRS) were used to determine how the 
evaluation system influences teaching practices. Both scales involve 
observation: antf the rating of multiple items on a 1 (lowest) to 5 
(highest) scale. , The third set of measures' were student achievement 
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indices. At" three different "points in_j^iime during the study 

V . _ W. - . ... 

(separated by approximately two mbnths each and synchronized with AIRS 

and SIRS observations), three one-minute oral reading measures were 

•administered to the student.. Posttest measures included two subtests 

from a standardized reading test. 

Thre.e formats were used to train the teachers to carry out a 

specific set of procedures that ihc^luded establishing ah appropriate 

reading measurement level, writing long-range goals and short-term 

objectives, administering direct reading measures^ graphing, and data 

'utilization .ih^ma'king decisions about the effectiveness of students'*^ 

' reading"^ instructional programs. The training formats included: (a) 

three half-day wbrRshpps at the beginning of the s^chool year 

• . _ _ ^- ' 

supplemented by a training manual and research feedback, (b) training 

by district jDersdnnel with the aid of the same manual, supplemented by 

phone contact with the researchers, and (c) one week of full-day 

workshops and periodic o^oing inservice. 

Tnstructidnal Rating ScgO ^ Validation (RR 107) 

During 1981-82, a bi-polar rating scale was developed ^br use in 

an experimerftal study on the' effects of teachers using direct and 

•frequent meas<irement of special veducatibh ^tudehts\ reading 

performance. The scale was developea^as^ a measure to monitor the 

" .structure oT instruction provided to target stuclents; it included 

variables identified' in' edocationa"] literature as important in' 

o ; ' , 

predicting elassrom achievement-.^ -Data , coll ecied from 158 elementary' 

school children in four" school .districts were analyzed tjo examine the 

r technical characteristics of the scale. The data were examined in 

terms of reliability and acviderice of a consistent ^ct-or structurei 
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